Sequential Testing and Adaptive Experimentation
What You’ll Learn
You will understand how to implement sequential testing strategies that allow you to make decisions continuously rather than waiting for predetermined sample sizes, reducing time-to-decision while maintaining statistical validity. This lesson introduces adaptive experimentation methods that reallocate traffic to winning variations as evidence accumulates, improving your revenue during the test itself.
Key Concepts
Sequential testing continuously monitors accumulating data and stops the experiment as soon as statistical significance is achieved, rather than pre-determining a fixed sample size before running the test. This approach reduces wasted traffic on clearly inferior variations and accelerates learning cycles in split testing by 30-50% compared to fixed-horizon experiments. However, sequential testing introduces complexities around multiple comparisons, early stopping rules, and alpha spending that must be carefully managed to avoid inflated false positive rates.
- Fixed vs. Sequential Horizons: Fixed-horizon testing collects data from a predetermined number of visitors before analyzing results, while sequential testing performs interim analyses and can stop early when evidence strongly favors one variant. A fixed-horizon checkout test might collect 10,000 visitors before concluding, while sequential testing might stop after 6,000 visitors if the difference becomes statistically conclusive, saving 40% of traffic.
- Spending Error Rate: In sequential testing, the alpha level (false positive rate) must be allocated across multiple looks at the data to maintain overall statistical integrity, a concept called “alpha spending.” Using a Pocock spending function allocates equal alpha proportions across interim analyses, while O’Brien-Fleming spending allocates less alpha to early looks, reducing early-stopping temptation while preserving statistical rigor.
- Adaptive Traffic Allocation: Rather than maintaining 50/50 traffic split, adaptive methods concentrate more traffic on variations showing superior performance as the test progresses, a technique called response-adaptive allocation. Thompson Sampling and other adaptive algorithms can increase total experiment conversion by 5-15% by dedicating more traffic to winning variants while still collecting sufficient data on alternatives to confirm their inferiority.
- Interim Analysis and Decision Rules: Sequential testing requires pre-specified boundaries that determine when to stop the test, typically defined through spending functions or group sequential design methodologies. Establishing these rules before data collection (defining specific p-value thresholds at each interim look) prevents the common temptation to cherry-pick stopping points after seeing results, which would invalidate statistical claims.
Practical Application
Select a current split test and establish interim analysis points at 25%, 50%, 75%, and 100% of your originally calculated sample size, setting specific stopping rules (such as p < 0.01 at interim looks, p < 0.05 at final look) using your preferred spending function. Implement monitoring dashboards that show you when statistical boundaries are crossed, allowing you to make real-time decisions while maintaining the predetermined statistical framework that preserves validity.