Confidence Intervals and Result Reliability
What You’ll Learn
You’ll master confidence intervals as a tool for understanding the range within which your true conversion rate likely falls. Confidence intervals provide The A/B Test Starter with a more nuanced view of result reliability than p-values alone, showing you both the precision of your estimate and the uncertainty inherent in testing.
Key Concepts
A confidence interval is a range of values around your observed conversion rate that likely contains the true population conversion rate. The A/B Test Starter uses 95% confidence intervals, meaning if you repeated your test infinitely, the true rate would fall within this range 95% of the time. Narrow confidence intervals indicate precise estimates; wide intervals indicate greater uncertainty. This tool complements p-values by showing not just whether a difference exists, but how precisely you’ve measured it.
- What Confidence Intervals Show: A 95% confidence interval of 10.2% to 11.8% for your variant means you’re 95% confident the true conversion rate falls somewhere in that range. In The A/B Test Starter, this range quantifies the precision of your measurement and accounts for natural sampling variation from test traffic.
- Interpreting Overlapping Intervals: When control and variant confidence intervals overlap significantly, it signals weak statistical evidence for a winner, even if one interval is numerically higher than the other. The A/B Test Starter treats overlapping intervals as “not ready to declare a winner” and recommends either continuing the test or investigating traffic quality issues.
- Sample Size Impact on Interval Width: Larger sample sizes create narrower confidence intervals because more data reduces estimation uncertainty. In The A/B Test Starter planning, this principle guides your sample size calculations—you need enough traffic to narrow intervals enough to distinguish between variants if a real difference exists.
- Using Intervals for Implementation Decisions: The A/B Test Starter recommends examining the confidence interval’s lower bound when deciding if improvement is worth implementing; if even the lower bound of your variant’s interval exceeds your control performance, you have strong evidence of improvement across realistic scenarios.
Practical Application
Find a completed test in your analytics platform that displays confidence intervals for both control and variant, then sketch a visual representation showing how the two intervals compare (overlapping, separated, or touching). Write a one-page memo interpreting what the confidence intervals reveal about the reliability of your results and whether they support declaring a winner according to The A/B Test Starter standards.