MLOps & Infrastructure

Model deployment, training infra, experiment tracking

80 bites

MLOps & Infrastructure30 sec read

Most LLM Apps Need Workflows Not Agent Frameworks

Most LLM apps ship faster and more reliably as deterministic workflows than autonomous agents. Plain Python with structured outputs and local functions beats CrewAI and LangGraph for debugging. Map control flow in code before importing any agent framework.

MLOps & Infrastructure30 sec read

ORPilot JSON IR Ends Solver Lock-In

ORPilot's open-source IR captures optimization models as solver-agnostic JSON, letting teams swap solvers or update data without calling the LLM again. It separates model structure from solver syntax, making LLM-generated OR models reproducible in production.

MLOps & Infrastructure30 sec read

Default Churn Thresholds Waste $86 per Customer

90% of 36 IBM Telco churn analyses use F1 and a 0.5 threshold, assuming equal costs for false positives and negatives. That is wrong by 13x, burning $86 per customer, or $8.6M at 100k subscribers. Swap accuracy for profit curves tied to LTV and CAC.

MLOps & Infrastructure30 sec read

Low GPU utilization on multi-GPU instance: diagnose and right-size

Tests distributed bottleneck triage. Strong answers profile CPU/GPU/disk, compare gradient sync time to compute, validate per-GPU batch size, and check NVLink vs PCIe. Red flag: suggesting more GPUs before ruling out data starvation or all-reduce overhead.

MLOps & Infrastructure30 sec read

Compare and contrast Apache Airflow versus Kubeflow Pipelines for ML orchestration

This tests matching orchestrators to ML constraints. A strong answer contrasts Airflow's task scheduling and backfills with Kubeflow's K8s-native GPU scaling, choosing based on team skills.

MLOps & Infrastructure30 sec read

Explain dynamic batching in inference servers and its trade-off

WHAT IT TESTS: Inference scheduling and the latency-vs-throughput trade-off. ANSWER OUTLINE: Dynamic batching launches when a time window or max size is met, improving throughput over static batching, but short ones wait for the slowest.

MLOps & Infrastructure30 sec read

Design a multi-tenant GPU serving system for hundreds of fine-tuned models

Tests GPU memory tradeoffs versus cold-start latency in multi-tenant serving. Strong answers propose tiered CPU staging, predictive pre-warming, and disaggregated prefill and decode. Red flag: keeping all models GPU-resident or ignoring transfer overhead.

MLOps & Infrastructure30 sec read

Compare Canary and Blue/Green ML deployments and model-specific metrics

WHAT IT TESTS: Model quality vs infra health in rollouts. ANSWER OUTLINE: Contrast Canary gradual shift vs Blue/Green instant swap; highlight silent failures, data drift, prediction distribution; cite accuracy and calibration.

MLOps & Infrastructure30 sec read

Expose a trained model as a simple web service

Practical MLOps knowledge from model serialization to serving. Package the model into a standard format, containerize it, expose a REST endpoint behind a load balancer, and add monitoring. A bare Flask server without containers or health checks is a red flag.

MLOps & Infrastructure31 sec read

How would you version control a 50GB dataset in a CI/CD pipeline?

WHAT IT TESTS: Code and data versioning without breaking CI/CD speed. ANSWER OUTLINE: Contrast Git LFS (simple, but 50GB chokes CI clones) with DVC (git metadata plus S3; enables selective pulls and CI cache). RED FLAG: Storing 50GB binaries in Git.

MLOps & Infrastructure30 sec read

How would GDPR requirements influence experiment tracking and model management design?

WHAT IT TESTS: designing for compliance as a systems constraint, not an afterthought. ANSWER OUTLINE: immutable data lineage, user exclusion lists, audit logs, versioned explainability. RED FLAG: manual deletion without model unlearning or provenance.

MLOps & Infrastructure30 sec read

How do you ensure ML experiment reproducibility beyond random seeds?

Tests system-level reproducibility through data versioning, environment capture, and pipeline automation. Strong answers cover versioned datasets, containerized dependencies, and immutable experiment logs.

MLOps & Infrastructure30 sec read

Describe an ML workflow with massive egress fees and re-architecture to mitigate

Tests whether you recognize egress spikes when storage and compute cross cloud or region boundaries. Great answers sketch a multi-cloud training pipeline, cite per-GB rates, and propose caching or compute placement. Red flag: suggesting compression alone.

MLOps & Infrastructure30 sec read

Design a showback or chargeback system for ML infrastructure costs

WHAT IT TESTS: Bridging ML telemetry with FinOps for shared GPU storage. ANSWER OUTLINE: Tag workloads to cost centers; define shared-resource formulas; automate reconciliation; use showback. RED FLAG: Using raw cloud bills as attribution without GL mapping.

MLOps & Infrastructure30 sec read

Design a near real-time cost visibility system for ML teams

Tests cost attribution across shared ML infrastructure and streaming pipeline design. Strong answers combine billing exports with resource labels, sub-hour aggregation, and anomaly detection for training spikes.

MLOps & Infrastructure30 sec read

Describe a basic lifecycle policy to manage cloud storage costs

This tests cost optimization via tiered storage and automated expiration. Strong answers list transitions from Standard to IA to Glacier, then deletion after set days, plus retrieval costs. A red flag is using manual scripts instead of native lifecycle rules.

MLOps & Infrastructure30 sec read

What is the wrong and right way to manage ML database secrets?

This tests secret management hygiene for ML pipelines. A strong answer rejects hardcoded secrets and env vars, then proposes AWS Secrets Manager with IAM retrieval, TLS, caching, and rotation. A red flag is suggesting .env files, ConfigMaps, or CLI arguments.

MLOps & Infrastructure31 sec read

Why was this customer denied: global or local explanation?

This tests matching questions to explanation scope. Global methods show overall behavior; local methods explain one prediction. Specific denials need local methods like SHAP. A red flag is using global summaries like permutation importance or PDPs for a case.

MLOps & Infrastructure30 sec read

How do you give read-only access to a shared cloud storage bucket?

WHAT IT TESTS: Least-privilege IAM for shared data pipelines. ANSWER OUTLINE: Bind an IAM role with read permissions to the team at the bucket level, avoid object-level ACLs, and mount read-only on training VMs.

MLOps & Infrastructure30 sec read

Propose an architectural solution for contended GPU training resources

Tests multi-tenant GPU scheduling design at scale. Great answers tier jobs by checkpointability, apply quota-based preemption, mix spot and on-demand instances, and use MIG or time-slicing to bin-pack. Red flag: buying GPUs without scheduling logic.

MLOps & Infrastructure · Tezvyn