Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

Solving Data Lineage Tracking And Data Discovery At WeWork

Solving Data Lineage Tracking And Data Discovery At WeWork

FromData Engineering Podcast


Solving Data Lineage Tracking And Data Discovery At WeWork

FromData Engineering Podcast

ratings:
Length:
62 minutes
Released:
Dec 16, 2019
Format:
Podcast episode

Description

Building clean datasets with reliable and reproducible ingestion pipelines is completely useless if it's not possible to find them and understand their provenance. The solution to discoverability and tracking of data lineage is to incorporate a metadata repository into your data platform. The metadata repository serves as a data catalog and a means of reporting on the health and status of your datasets when it is properly integrated into the rest of your tools. At WeWork they needed a system that would provide visibility into their Airflow pipelines and the outputs produced. In this episode Julien Le Dem and Willy Lulciuc explain how they built Marquez to serve that need, how it is architected, and how it compares to other options that you might be considering. Even if you already have a metadata repository this is worth a listen to learn more about the value that visibility of your data can bring to your organization.
Released:
Dec 16, 2019
Format:
Podcast episode

Titles in the series (100)

Weekly deep dives on data management with the engineers and entrepreneurs who are shaping the industry