Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

Data Migration Strategies For Large Scale Systems

Data Migration Strategies For Large Scale Systems

FromData Engineering Podcast


Data Migration Strategies For Large Scale Systems

FromData Engineering Podcast

ratings:
Length:
60 minutes
Released:
May 26, 2024
Format:
Podcast episode

Description

Summary
Any software system that survives long enough will require some form of migration or evolution. When that system is responsible for the data layer the process becomes more challenging. Sriram Panyam has been involved in several projects that required migration of large volumes of data in high traffic environments. In this episode he shares some of the valuable lessons that he learned about how to make those projects successful.
Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management
Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.
This episode is supported by Code Comments, an original podcast from Red Hat. As someone who listens to the Data Engineering Podcast, you know that the road from tool selection to production readiness is anything but smooth or straight. In Code Comments, host Jamie Parker, Red Hatter and experienced engineer, shares the journey of technologists from across the industry and their hard-won lessons in implementing new technologies. I listened to the recent episode "Transforming Your Database" and appreciated the valuable advice on how to approach the selection and integration of new databases in applications and the impact on team dynamics. There are 3 seasons of great episodes and new ones landing everywhere you listen to podcasts. Search for "Code Commentst" in your podcast player or go to dataengineeringpodcast.com/codecomments (https://www.dataengineeringpodcast.com/codecomments) today to subscribe. My thanks to the team at Code Comments for their support.
Your host is Tobias Macey and today I'm interviewing Sriram Panyam about his experiences conducting large scale data migrations and the useful strategies that he learned in the process
Interview
Introduction
How did you get involved in the area of data management?
Can you start by sharing some of your experiences with data migration projects?
As you have gone through successive migration projects, how has that influenced the ways that you think about architecting data systems?
How would you categorize the different types and motivations of migrations?
How does the motivation for a migration influence the ways that you plan for and execute that work?
Can you talk us through one or two specific projects that you have taken part in?
Part 1: The Triggers
Section 1: Technical Limitations triggering Data Migration
Scaling bottlenecks: Performance issues with databases, storage, or network infrastructure
Legacy compatibility: Difficulties integrating with modern tools and cloud platforms
System upgrades: The need to migrate data during major software changes (e.g., SQL Server version upgrade)
Section 2: Types of Migrations for Infrastructure Focus
Storage migration: Moving data between systems (HDD to SSD, SAN to NAS, etc.)
Data center migration: Physical relocation or consolidation of data centers
Virtualization migration: Moving from physical servers to virtual machines (or vice versa)
Section 3: Technical Decisions Driving Data Migrations
End-of-life support: Forced migration when older software or hardware is sunsetted
Security and compliance: Adopting new platforms with better security postures
Cost Optimization: Potential savings of cloud vs. on-premise data centers
Part 2: Challenges (and Anxieties)
Section 1: Technical Challenges
Data transformation challenges: Schema changes, complex data mappings
Network band
Released:
May 26, 2024
Format:
Podcast episode

Titles in the series (100)

Weekly deep dives on data management with the engineers and entrepreneurs who are shaping the industry