Design a near real-time cost visibility system for ML teams
Tests cost attribution across shared ML infrastructure and streaming pipeline design. Strong answers combine billing exports with resource labels, sub-hour aggregation, and anomaly detection for training spikes.
Tests cost attribution across shared ML infrastructure and actionable streaming pipeline design. A strong answer covers four things: data sources like cloud billing exports, resource labels, and GPU metrics APIs; processing that enriches records with ownership and aggregates in sub-hour windows; anomaly detection instead of static thresholds because training spikes are expected but should be bounded; and actionable alerts to project owners with job IDs, runaway cost estimates, and optional kill switches.
Read the original → docs.cloud.google.com
- #mlops
- #cost-optimization
- #data-engineering
- #monitoring
- #system-design
Get five bites like this every day.
Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.