tezvyn:

How would you instrument and query P95 API latency by region?

Source: sre.googleintermediate

This tests white-box latency instrumentation and safe cardinality for percentile aggregation. Strong answer: emit histograms by region, query P95 with histogram_quantile or a log percentile, and keep trace IDs in logs only.

This tests whether you can design white-box monitoring for request latency with dimensional labels without causing cardinality explosion. A strong answer covers four things: instrumenting the app with histograms or timers tagged by a low-cardinality region label; aggregating with histogram_quantile in Prometheus or percentile() in logs over a sliding window; keeping high-cardinality context like trace IDs in structured logs rather than metric labels; and validating accuracy by comparing histogram buckets against raw log samples.

Read the original → sre.google

Get five bites like this every day.

Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.

How would you instrument and query P95 API latency by region? · Tezvyn