When is a pie or donut chart appropriate?
Tests judgment of part-to-whole encoding. Answer: use for few categories with clear dominance, cite a share scenario like device traffic, and name angle-comparison difficulty and 3D distortion as pitfalls.

How do you manage event schema evolution without breaking reports?
WHAT IT TESTS: Contract-change discipline across ingestion, warehouse, and BI. ANSWER OUTLINE: Backward-compatible serialization, nullable new fields, raw versus modeled layers, versioned schemas, and consumer alerts.

What are the challenges of grouping by a high-cardinality dimension?
Tests columnar storage internals and query engine scalability. A strong answer covers memory pressure from giant hash tables, destroyed compression ratios, and massive result-set overhead.

How do duplicate events bias COUNT(*) and daily login reports?
Tests idempotency in streaming analytics. COUNT(*) overcounts; fix with unique event ID dedup via idempotent writes or COUNT(DISTINCT id), plus daily partition reconciliation. Red flag: SELECT DISTINCT * without a stable key or no reporting safeguard.

Why is star schema preferred over 3NF for analytics?
Tests your grasp of the read-performance trade-off in analytical schemas. A great answer names fact and dimension tables, emphasizes fewer joins for aggregations, and cites simpler SQL and faster query plans.

Slow dashboard querying a large fact table: first three checks?
This tests systematic diagnosis of fact-table query latency. A strong answer checks the execution plan and indexing, evaluates partitioning and data model fit, and inspects caching or pre-aggregation.

Design a scalable data governance framework balancing autonomy and control
WHAT IT TESTS: Federated governance balancing autonomy with interop via policy. OUTLINE: Self-serve platform with domain products, auto-catalog, schema contracts, and policy-as-code access in CI/CD. RED FLAG: Centralized manual approval of schemas and access.

Design a CDC pipeline that handles schema evolution gracefully
Tests designing resilient CDC pipelines against schema drift. A strong answer covers schema registries with versioning, backward-compatible serialization, and automated compatibility checks.
What is a data schema and why enforce it at ingestion?
Tests schemas as contracts and ingestion validation as a quality gate. Strong answers cite blueprints with constraints, fail-fast ingestion catching type errors upstream, and downstream trust. Red flag: treating schemas as optional docs affecting only storage.

Design a data quality framework from source to consumption
This tests full-lifecycle data architecture. Strong answers define ownership first, then schema contracts at ingestion, profiling and anomaly detection in CI/CD, column-level lineage, and KPI-linked scorecards. Red flag: tools before ownership or RACI.
What is data partitioning in a cloud data warehouse?
Tests physical data layout and cost/performance tradeoffs. Strong answers define time-based or integer-range partitioning, explain partition pruning avoids full scans, and warn against high-cardinality keys.

Describe star and snowflake schemas and their trade-offs
WHAT IT TESTS: Dimensional modeling denormalization trade-offs. ANSWER OUTLINE: Star schemas flatten dimensions for fast joins; snowflake schemas normalize hierarchies to reduce redundancy but add joins.
Explain data warehouse purpose and how it differs from OLTP
This tests whether you know the OLTP versus analytics split. A great answer contrasts OLTP row-level writes and normalized schemas with warehouse denormalized schemas and BI reads. A red flag is calling a warehouse just a bigger OLTP database.

Differences between ETL and ELT, and when to choose each
WHAT IT TESTS: Pipeline architecture tradeoffs. ANSWER OUTLINE: ETL transforms before loading for structured data; ELT loads raw first and transforms in the warehouse for scale. RED FLAG: Calling one better without citing volume, structure, or compute.

How do you guarantee at-least-once event delivery for a financial transaction?
WHAT IT TESTS: Atomicity of state changes and side effects without 2PC. ANSWER OUTLINE: Write events to a DB outbox in the same transaction as the biz update; a relay polls and publishes to analytics. RED FLAG: Suggesting direct HTTP POSTs or dual writes.

Design client-side event batching and prevent unload data loss
It tests balancing network efficiency and data reliability in browser analytics. Strong answers cover in-memory batching with size or time triggers, sendBeacon or fetch keepalive on visibilitychange, and a retry queue.

Trade-offs: third-party analytics SDK versus in-house pipeline
This tests strategic build-versus-buy judgment for data infrastructure. Strong answers weigh time-to-market, maintenance burden, data sovereignty, and compliance against core product focus.

Conversion metric dropped suddenly with no recent deployments; debug instrumentation causes
Distinguishing real regressions from telemetry pipeline failures. Segment by device, channel, and geography to spot uniform loss signaling a tagging break; verify vendor delays and sampling; check for consent or ad-blocker shifts.

How do you track page views in a Single Page Application?
This tests SPA analytics beyond classic page loads. A strong answer covers History API pushState and popstate events, framework router hooks like useEffect or afterEach, and beaconing views. A red flag is relying only on window.load or polling URL changes.

How do you measure data platform ROI and track it?
WHAT IT TESTS: Linking platform spend to business value and team health. ANSWER OUTLINE: Cite adoption, time to insight, downtime cost, and cost per workload; then describe cost tags and usage telemetry.