Calculating Required Sample Size and Duration
What You’ll Learn
You’ll master calculating how many visitors and how many days your test needs to run before results are statistically reliable. Undersizing your test leads to wasted time chasing false positives, while oversizing wastes opportunity cost—A/B Test Starters who calculate correctly avoid both mistakes.
Key Concepts
Sample size depends on four factors: your baseline conversion rate, the minimum effect size you want to detect, your statistical power (usually 80%), and your significance level (usually 95% confidence). Most platforms include built-in sample size calculators that answer the critical question: how many visitors until we can reliably trust these results? For A/B Test Starters, understanding that a 1% conversion rate site needs vastly more traffic than a 10% conversion rate site prevents the common mistake of running a test for one week then declaring victory based on trends rather than statistics.
- Baseline Conversion Rate as Your Starting Point: Calculate your current conversion rate for the metric you’re testing—if 100 of 10,000 visitors complete your goal, your baseline is 1%. This number is critical because low baseline rates require exponentially more traffic to detect improvements; a 20% lift on a 1% baseline requires far more data than a 20% lift on a 20% baseline.
- Minimum Detectable Effect (MDE) and Business Relevance: Define the smallest improvement you care about—for revenue-per-visitor, this might be 5%, while for engagement metrics, you might detect 10% changes. A/B Test Starters should set MDE based on business impact, not statistical convenience; a 1% revenue improvement might be worth pursuing if you have adequate traffic, but a 1% click improvement on a secondary metric often isn’t.
- Sample Size Calculator Usage: Input your baseline conversion rate, desired MDE, desired confidence level (95%), and power (80%), and your platform calculates required visitors per variation. Most tools also convert this to “days until completion” by analyzing your daily traffic volume, helping you forecast whether a test will reach significance in one week or needs four weeks.
- Traffic Velocity and Calendar Time vs. Sample Time: Distinguish between calendar time (July 5-12) and sample time (the number of user interactions needed); a high-traffic site reaches sample size in days while a low-traffic site might need weeks for identical experiments. Account for day-of-week effects and seasonality by avoiding tests that span weekends or holidays unless those periods are representative of your actual traffic.
Practical Application
Use your platform’s sample size calculator to determine required visitors and estimated duration for your first test using your actual baseline conversion rate and a realistic minimum effect size. Document your assumptions (baseline rate, MDE, confidence level) in a test plan and share with stakeholders so everyone understands why the test requires X visitors, setting realistic expectations for launch timing.