Architect a large-scale real-time recommendation system with data pipelines

Curated by the Tezvyn teamJune 17, 2026Source: systemdesignhandbook.comadvanced

Tests multi-stage ML serving under 200ms latency. Strong answers use a funnel: two-tower embeddings with ANN retrieval, ranking, and guardrails, plus separate batch and real-time pipelines. Red flag: scoring the full catalog per request without approximation.

WHAT THIS TESTS: This question tests whether you can design a distributed machine learning serving architecture that balances personalization, throughput, and strict latency budgets. Interviewers want to see that you understand the multi-stage funnel used by Netflix and Spotify, and that you can reason about data pipelines, embedding stores, and business guardrails rather than treating recommendation as a single model call.

A GOOD ANSWER COVERS: First, separate the system into candidate generation, ranking, and re-ranking stages. For candidate generation, describe two-tower neural networks as the industry standard: one tower encodes the user, the other encodes the item, and training produces embeddings that allow approximate nearest neighbor search to retrieve hundreds of candidates in sub-millisecond time across millions of items. Second, explain the ranking stage, where a lightweight model scores the retrieved candidates using real-time user features and content features, followed by re-ranking that applies business guardrails such as diversity, freshness, and exploration versus exploitation. Third, discuss data pipelines: a batch pipeline for historical training data and user profile snapshots, and a streaming pipeline to capture clicks, views, and purchases in near real time so the system adapts to changing preferences. Fourth, justify database choices with a polyglot approach: use a key-value store or wide-column database for user profiles and content metadata, a vector database or ANN index for embedding retrieval, and a caching layer for hot embeddings and precomputed recommendations to keep p99 latency under 200 milliseconds. Fifth, mention A/B testing infrastructure and horizontal scalability to handle traffic spikes while maintaining 99.9 percent uptime.

COMMON WRONG ANSWERS: A major red flag is proposing a single monolithic model that scores every item in the catalog for each user request. At YouTube scale, where billions of videos must be considered, this is computationally impossible within a 200ms budget. Another red flag is ignoring the cold-start problem for new users or new content, or failing to balance exploration with exploitation. Finally, suggesting only one database type for all concerns shows a lack of understanding of the differing access patterns between metadata lookups, vector search, and event ingestion.

LIKELY FOLLOW-UPS: Expect the interviewer to ask how you would handle a user with no history, how you would update recommendations within seconds of a major event, or how you would debug a sudden drop in click-through rate. They may also ask for a concrete latency breakdown of the funnel stages, or how you would enforce guardrails like preventing filter bubbles without sacrificing engagement.

ONE CONCRETE EXAMPLE: Netflix evaluates 100 million user profiles against thousands of titles every second, and its recommendation system drives 80 percent of viewing hours. YouTube ranks billions of videos for over 2 billion monthly users, processes 500 hours of video uploaded every minute, and generates 70 percent of total watch time from recommendations. Amazon surfaces products from a catalog of 350 million items, and its recommendation engine drives 35 percent of revenue. These numbers illustrate why approximation, caching, and multi-stage filtering are non-negotiable in production.

Source: systemdesignhandbook.com

Read the original → systemdesignhandbook.com

#recommendation systems
#machine learning
#system design
#data pipelines
#databases

Get five bites like this every day.

Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.

Get on Play Store Get on App Store