Escalation Policy: When to Stop Shipping and Start Fixing

An escalation policy is a pre-agreed plan for when to divert engineers from feature work to fix reliability. When a service's error budget burns too fast, the policy's thresholds trigger specific actions. The footgun is thinking a quick rollback is enough.
An escalation policy is a structured plan for redirecting engineering effort from feature development to reliability work when a service is at risk. It provides thresholds for consistent action, like paging an on-call when an SLO's error budget burns too fast. This allows teams to balance feature velocity with stability. The main footgun is confusing a quick fix (like a rollback) with a real solution; the policy requires fixing the entire *class* of problem to prevent recurrence.
Read the original → cloud.google.com
- #sre
- #monitoring
- #slo
- #on-call
- #reliability
Get five bites like this every day.
Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.