tezvyn:

The multiple comparisons problem in A/B testing

Source: interviewintermediate

WHAT IT TESTS: Statistical rigor at scale. OUTLINE: Many tests at alpha 0.05 inflate the chance of a false positive; mitigate with Bonferroni or FDR control plus pre-registered metrics. RED FLAG: Cherry-picking whichever metric crosses p<0.05.

WHAT IT TESTS: Whether you understand why mass experimentation breaks naive significance testing. ANSWER OUTLINE: With each test at a 5 percent false-positive rate, running many tests or metrics makes at least one spurious result almost certain; the family-wise error rate balloons. Mitigate with Bonferroni, which is conservative, or Benjamini-Hochberg false discovery rate control, better at scale. System: pre-register a primary metric, separate guardrails, avoid peeking. RED FLAG: Shipping whichever metric crosses p<0.05.

Read the original → interview

Get five bites like this every day.

Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.

The multiple comparisons problem in A/B testing · Tezvyn