The multiple comparisons problem in A/B testing
WHAT IT TESTS: Statistical rigor at scale. OUTLINE: Many tests at alpha 0.05 inflate the chance of a false positive; mitigate with Bonferroni or FDR control plus pre-registered metrics. RED FLAG: Cherry-picking whichever metric crosses p<0.05.
WHAT IT TESTS: Whether you understand why mass experimentation breaks naive significance testing. ANSWER OUTLINE: With each test at a 5 percent false-positive rate, running many tests or metrics makes at least one spurious result almost certain; the family-wise error rate balloons. Mitigate with Bonferroni, which is conservative, or Benjamini-Hochberg false discovery rate control, better at scale. System: pre-register a primary metric, separate guardrails, avoid peeking. RED FLAG: Shipping whichever metric crosses p<0.05.
Read the original → interview
- #ab-testing
- #multiple-comparisons
- #statistics
- #false-discovery-rate
- #experimentation
Get five bites like this every day.
Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.