tezvyn:

CSV vs JSON vs Parquet for analytics

Source: interviewbeginner

WHAT IT TESTS: file format tradeoffs. OUTLINE: CSV and JSON are row-based, human-readable, and bulky; columnar Parquet/ORC compress well and read only needed columns; choose columnar for analytics. RED FLAG: defaulting to CSV for large analytical workloads.

WHAT IT TESTS: whether you understand storage format impact on analytical performance and cost. ANSWER OUTLINE: CSV is simple and row-oriented but untyped, uncompressed, and reads every column; JSON adds nested structure and self-description but is verbose and slow to scan; Parquet and ORC are columnar, typed, and compressed, so analytical queries read only the needed columns and skip irrelevant data with statistics, drastically cutting IO and cost. For analytics choose a columnar format.

Read the original → interview

Get five bites like this every day.

Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.

CSV vs JSON vs Parquet for analytics · Tezvyn