Explain data lineage and how you would implement it
This tests your ability to design for data observability. Define lineage (origin, transformation, movement), then propose a solution using metadata extraction (OpenLineage) and a central graph store/UI (Marquez) to trace data from microservices to analytics.
This tests your ability to translate a data governance concept into a practical system design for a distributed environment. A great answer first defines lineage (origin, transformations, movement), then outlines an implementation. Propose using a standard like OpenLineage to emit metadata from services and ETL jobs. This data is sent to a backend like Marquez, which builds a graph of dependencies. This allows for visualization and tracing errors back to their source. The red flag is only defining the term or suggesting manual documentation.
Read the original → Wikipedia: Data lineage
- #data engineering
- #system design
- #observability
- #analytics
Get five bites like this every day.
Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.