Describe the difference between online and batch inference.

WHAT IT TESTS: Your grasp of serving patterns and infra tradeoffs. ANSWER OUTLINE: Online uses autoscaling APIs for millisecond-to-second latency; batch uses scheduled compute for minute-to-hour latency.
WHAT IT TESTS: Whether you can distinguish synchronous serving from offline processing and match each to infrastructure, latency, and cost. ANSWER OUTLINE: Online needs autoscaling APIs or serverless with P99 latencies from milliseconds to seconds, strict SLAs, and per-request GPU/CPU; batch uses job queues or workflow engines to process large datasets on spot instances, tolerating minutes to hours. RED FLAG: Recommending one architecture for both or dismissing cold-start, queueing, and cost-per-query differences.
Read the original → inferencesystemsauthority.com
- #mlops
- #inference
- #system design
- #infrastructure
- #latency
Get five bites like this every day.
Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.