
Propose a technical architecture for a centralized Metrics Layer.
This tests your grasp of data governance and semantic layers. A great answer outlines a system with a central definition store (e.g., YAML in Git), a query engine, and an API, ensuring all teams get consistent metric results.

Design a Real-Time Analytics Pipeline for Mobile Events
This tests your grasp of low-latency streaming architectures. A good answer outlines ingestion (SDK to Kafka/Kinesis), real-time processing (Flink/Spark), and sinking to a fast OLAP database (Druid/ClickHouse). A red flag is proposing a batch-based ETL design.
Design a data model for tracking feature adoption
This tests your grasp of data warehouse star schemas for analytics. Outline a fact table for events and dimension tables for users and features, explaining how this structure enables fast, ad-hoc cohort analysis for a product manager.

Explain cohort analysis and write a retention query
Tests your grasp of user behavior metrics. First, define a cohort (e.g., May signups). Then, explain why it's better than aggregate retention. Finally, outline a query grouping users by signup week and checking for activity. A red flag is a flawed query.

Track an 'Export to CSV' button's usage
This tests modeling a feature's full lifecycle (intent, success, failure) in analytics. Propose one custom event like `export_interaction` with a `status` parameter ('success', 'failure') fired on completion, not just on click.

Design a Near Real-Time Analytics Pipeline
Tests your ability to design a low-latency data system and articulate trade-offs. A good answer covers ingestion (Kafka), processing (Flink), storage (Druid), and visualization (Grafana), contrasting the architecture's low latency with a batch setup.
Optimizing a Slow, Expensive Data Warehouse for BI Dashboards
Tests your grasp of data warehouse architecture beyond basic SQL. A great answer covers partitioning/clustering, materialized views for pre-aggregation, and cost controls. A red flag is suggesting only query rewrites or just 'adding more compute'.

Describe the role of dbt in a modern analytics stack
Tests your grasp of modern ELT patterns. A good answer explains how dbt uses SQL for in-warehouse transformations, enabling software engineering practices like version control and testing. A red flag is confusing dbt with a full ETL tool or an orchestrator.

Describe tracking a user event from frontend to BI tool
Tests your grasp of the modern data stack. A great answer outlines the five stages: frontend emission, ingestion, loading into a warehouse, transformation, and BI visualization. A red flag is describing only the frontend code and ignoring the data pipeline.

Correlate API Slowness with User Engagement
This tests your ability to design a controlled experiment and join disparate data. A good answer outlines an A/B test with an artificial delay, logging with shared IDs, and statistical analysis. A red flag is proposing a purely observational study.

Design a Real-Time Anomaly Detection System for E-commerce Events
This tests your ability to design a real-time data pipeline and apply ML to a business problem. Outline a streaming architecture (e.g., Kinesis), processing, and storage.

Describe two methods for generating prediction intervals
This tests your grasp of uncertainty quantification. A great answer contrasts an analytical method (assuming normal errors, using multipliers like 1.96 for 95%) with a simulation method (bootstrapping residuals).

Which model for forecasting with seasonality and trend?
This tests your knowledge of classical time series models. A good answer names Holt-Winters, explaining its level, trend, and seasonal components. It also discusses choosing between additive and multiplicative methods. A red flag is jumping to complex models.

Train-Test Split vs. Time-Series Cross-Validation
This tests your grasp of data leakage in temporal data. A good answer explains why random splits create lookahead bias, then details how rolling-origin validation respects time. A red flag is just describing methods without explaining *why* one is necessary.

How would you measure a sales forecast model's accuracy?
Tests if you can link statistical metrics to business outcomes. Define MAE (average error) and RMSE (penalizes large errors). Choose RMSE when large misses are costly (e.g., stock-outs), MAE otherwise. A red flag is reciting formulas without business context.

How CUPED increases statistical power in experiments
Tests your grasp of variance reduction in A/B testing. Explain how CUPED uses correlated pre-experiment data to reduce outcome variance, increasing statistical power. A red flag is confusing it with simpler difference scores, which can actually increase noise.

Primary vs. Guardrail Metrics in Experiments
Tests your grasp of risk management in A/B testing. A great answer defines a primary metric as the goal and a guardrail as a 'do no harm' check. A feature ships only if the primary improves without hurting guardrails.

Why is stopping an A/B test early problematic?
Tests understanding of the 'peeking problem' in A/B testing. A good answer defines peeking, explains how it inflates false positive rates, and contrasts it with waiting for a pre-determined sample size. A red flag is not explaining the statistical mechanism.

Explain Simpson's Paradox with a user engagement example
Tests if you see beyond aggregate data. Define the paradox, give a numerical example where a feature fails overall but wins in segments (e.g., new vs. returning users), and name the confounding variable. A vague definition without numbers is a red flag.

Mean vs. Median for API Response Times?
Tests your understanding of non-normal distributions like latency. Choose median as it's robust to outliers that skew the mean. Then, state that even median is insufficient; percentiles (p99, max) are crucial for capturing the full user experience.