CSV vs JSON vs Parquet for analytics
WHAT IT TESTS: file format tradeoffs. OUTLINE: CSV and JSON are row-based, human-readable, and bulky; columnar Parquet/ORC compress well and read only needed columns; choose columnar for analytics. RED FLAG: defaulting to CSV for large analytical workloads.
WHAT IT TESTS: whether you understand storage format impact on analytical performance and cost. ANSWER OUTLINE: CSV is simple and row-oriented but untyped, uncompressed, and reads every column; JSON adds nested structure and self-description but is verbose and slow to scan; Parquet and ORC are columnar, typed, and compressed, so analytical queries read only the needed columns and skip irrelevant data with statistics, drastically cutting IO and cost. For analytics choose a columnar format.
Read the original → interview
- #cloud
- #parquet
- #file-formats
- #data-lake
- #analytics
Get five bites like this every day.
Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.