Ever launched what you thought was a game-changing feature, only to see your A/B test results come back "inconclusive"? You're not alone. The culprit might not be your feature - it could be that you set up your test to miss the very improvement you were looking for.
This is where minimum detectable effect (MDE) comes in. Think of it as the sensitivity setting on your experiment's microscope. Set it right, and you'll spot the changes that matter. Set it wrong, and you'll either waste resources chasing tiny improvements or miss real wins entirely.
Here's the thing about minimum detectable effect - it's not just some arbitrary number you pick out of thin air. It's the smallest true effect your experiment can reliably detect, given your statistical power and significance level. And yes, there's a difference between "smallest detectable change" and "smallest true effect" (trust me, this distinction matters when you're designing tests).
The relationship between MDE and sample size is pretty straightforward: higher MDE = smaller sample size needed. But here's the catch - set it too high and you'll miss those smaller wins that could add up over time. Set it too low? You'll be running tests until the heat death of the universe.
So how do you find that sweet spot? Start with what actually matters to your business. If you're running a high-traffic e-commerce site, maybe a 2% lift in conversion rate moves the needle. But if you're working with enterprise software where each sale is worth millions, even a 0.5% improvement could be huge. The folks at Statsig built a power analysis calculator specifically for this - plug in your numbers and it'll tell you if your test plan is realistic or fantasy.
Here's what trips people up: MDE isn't universal. A 5% MDE might work great for your checkout flow optimization, but be completely wrong for testing email subject lines. Context is everything. I've seen teams burn through weeks of testing because they copied MDE settings from a completely different type of experiment.
Getting your MDE right isn't just about statistical correctness - it's about not wasting everyone's time. Set it too high and you'll declare "no significant difference" on changes that actually would've helped your business. The product managers who've been through this will tell you: nothing kills experimentation culture faster than a string of "failed" tests that were actually just poorly configured.
The tradeoffs are real:
Lower MDE: More sensitive to small changes, but needs massive sample sizes
Higher MDE: Quick results, but might miss subtle improvements
Just right: Detects changes that actually impact your bottom line
I've watched teams set impossibly low MDEs because they wanted to catch every tiny improvement. What happened? Their tests ran for months, product development ground to a halt, and by the time results came in, the market had already moved on. On the flip side, I've also seen the "let's just test for 20% lifts" crowd miss consistent 5-10% wins that would've compounded into serious growth.
The key is aligning MDE with what actually matters to your business. If a 3% improvement wouldn't change any decisions, why design your test to detect it? Tools like Statsig's calculator help here by showing you the time-cost of different MDE choices before you commit.
Your baseline conversion rate is probably the biggest factor people overlook when setting MDE. Here's why it matters: detecting a 2% absolute lift when your baseline is 50% is way easier than when it's 2%. The math just works that way. Higher baselines = smaller required sample sizes for the same MDE.
Then there's the eternal triangle of MDE, statistical power, and significance level. You can't maximize all three - something's gotta give:
Want high power (catching real effects)? Need bigger samples
Want low significance level (avoiding false positives)? Also need bigger samples
Want to detect tiny effects? You guessed it - bigger samples
Smart teams use MDE to prioritize their experimentation roadmap. Got a redesign that you think will lift conversions by 15%? Test that before the button color change you hope might squeeze out 1%. This isn't just about statistics - it's about focusing on changes that'll actually move your business forward. Some teams even use what they call the Minimum Caring Effect - basically asking "what's the smallest change we'd actually act on?"
The duration calculation discussions on Reddit nail this point: your minimum acceptable improvement often dictates how long you'll be waiting for results. Want to detect a 1% lift? Better clear your calendar for the next few weeks.
Let's get practical. Start with business impact, not statistical purity. Ask yourself: what's the smallest improvement that would make this test worth running? If you're testing a new checkout flow, maybe that's a 5% conversion lift. If you're tweaking email copy, perhaps 10% more opens would justify the effort.
Here's my framework for setting MDE:
Calculate the actual dollar impact of different effect sizes
Factor in how long you're willing to wait for results
Check if you have enough traffic to detect that effect
Adjust until you find a realistic balance
For most teams, these guidelines work well:
High-traffic consumer sites: 3-5% MDE is usually reasonable
B2B with lower volume: 10-15% MDE might be more realistic
Email/messaging tests: 8-12% MDE for open rates
Enterprise software: Focus on leading indicators with higher MDE
Remember that MDE represents the smallest true effect you can reliably detect - not just any random fluctuation. This distinction matters because it helps you avoid getting excited about noise in your data.
The biggest mistake? Setting MDE in isolation. Talk to your stakeholders. If product leadership wouldn't change strategy based on a 2% improvement, don't design tests to detect it. Your MDE should reflect the minimum effect that would actually influence decisions.
Setting the right MDE is half art, half science. The math gives you the framework, but knowing your business context makes it actually useful. Start with what would meaningfully impact your metrics, work backwards to sample size requirements, and adjust until you find something realistic.
Next time you're planning an A/B test, spend an extra 10 minutes thinking through your MDE. It might save you weeks of waiting for inconclusive results. And if you're looking for more depth on power analysis and sample size calculations, the statistics community on Reddit has some great technical discussions, while Statsig's blog covers the practical implementation side.
Hope you find this useful! Now go forth and run better experiments.