MDE in A/B testing: Setting realistic expectations for your experiments

Mon Jun 23 2025

Ever launched what you thought was a game-changing feature, only to see your A/B test results come back "inconclusive"? You're not alone. The culprit might not be your feature - it could be that you set up your test to miss the very improvement you were looking for.

This is where minimum detectable effect (MDE) comes in. Think of it as the sensitivity setting on your experiment's microscope. Set it right, and you'll spot the changes that matter. Set it wrong, and you'll either waste resources chasing tiny improvements or miss real wins entirely.

Understanding minimum detectable effect (MDE) in A/B testing

Here's the thing about minimum detectable effect - it's not just some arbitrary number you pick out of thin air. It's the smallest true effect your experiment can reliably detect, given your statistical power and significance level. And yes, there's a difference between "smallest detectable change" and "smallest true effect" (trust me, this distinction matters when you're designing tests).

The relationship between MDE and sample size is pretty straightforward: higher MDE = smaller sample size needed. But here's the catch - set it too high and you'll miss those smaller wins that could add up over time. Set it too low? You'll be running tests until the heat death of the universe.

So how do you find that sweet spot? Start with what actually matters to your business. If you're running a high-traffic e-commerce site, maybe a 2% lift in conversion rate moves the needle. But if you're working with enterprise software where each sale is worth millions, even a 0.5% improvement could be huge. The folks at Statsig built a power analysis calculator specifically for this - plug in your numbers and it'll tell you if your test plan is realistic or fantasy.

Here's what trips people up: MDE isn't universal. A 5% MDE might work great for your checkout flow optimization, but be completely wrong for testing email subject lines. Context is everything. I've seen teams burn through weeks of testing because they copied MDE settings from a completely different type of experiment.

The importance of setting the right MDE

Getting your MDE right isn't just about statistical correctness - it's about not wasting everyone's time. Set it too high and you'll declare "no significant difference" on changes that actually would've helped your business. The product managers who've been through this will tell you: nothing kills experimentation culture faster than a string of "failed" tests that were actually just poorly configured.

The tradeoffs are real:

  • Lower MDE: More sensitive to small changes, but needs massive sample sizes

  • Higher MDE: Quick results, but might miss subtle improvements

  • Just right: Detects changes that actually impact your bottom line

I've watched teams set impossibly low MDEs because they wanted to catch every tiny improvement. What happened? Their tests ran for months, product development ground to a halt, and by the time results came in, the market had already moved on. On the flip side, I've also seen the "let's just test for 20% lifts" crowd miss consistent 5-10% wins that would've compounded into serious growth.

The key is aligning MDE with what actually matters to your business. If a 3% improvement wouldn't change any decisions, why design your test to detect it? Tools like Statsig's calculator help here by showing you the time-cost of different MDE choices before you commit.

Factors influencing MDE and its impact on experiments

Your baseline conversion rate is probably the biggest factor people overlook when setting MDE. Here's why it matters: detecting a 2% absolute lift when your baseline is 50% is way easier than when it's 2%. The math just works that way. Higher baselines = smaller required sample sizes for the same MDE.

Then there's the eternal triangle of MDE, statistical power, and significance level. You can't maximize all three - something's gotta give:

  • Want high power (catching real effects)? Need bigger samples

  • Want low significance level (avoiding false positives)? Also need bigger samples

  • Want to detect tiny effects? You guessed it - bigger samples

Smart teams use MDE to prioritize their experimentation roadmap. Got a redesign that you think will lift conversions by 15%? Test that before the button color change you hope might squeeze out 1%. This isn't just about statistics - it's about focusing on changes that'll actually move your business forward. Some teams even use what they call the Minimum Caring Effect - basically asking "what's the smallest change we'd actually act on?"

The duration calculation discussions on Reddit nail this point: your minimum acceptable improvement often dictates how long you'll be waiting for results. Want to detect a 1% lift? Better clear your calendar for the next few weeks.

Best practices for setting realistic MDE in A/B testing

Let's get practical. Start with business impact, not statistical purity. Ask yourself: what's the smallest improvement that would make this test worth running? If you're testing a new checkout flow, maybe that's a 5% conversion lift. If you're tweaking email copy, perhaps 10% more opens would justify the effort.

Here's my framework for setting MDE:

  1. Calculate the actual dollar impact of different effect sizes

  2. Factor in how long you're willing to wait for results

  3. Check if you have enough traffic to detect that effect

  4. Adjust until you find a realistic balance

For most teams, these guidelines work well:

  • High-traffic consumer sites: 3-5% MDE is usually reasonable

  • B2B with lower volume: 10-15% MDE might be more realistic

  • Email/messaging tests: 8-12% MDE for open rates

  • Enterprise software: Focus on leading indicators with higher MDE

Remember that MDE represents the smallest true effect you can reliably detect - not just any random fluctuation. This distinction matters because it helps you avoid getting excited about noise in your data.

The biggest mistake? Setting MDE in isolation. Talk to your stakeholders. If product leadership wouldn't change strategy based on a 2% improvement, don't design tests to detect it. Your MDE should reflect the minimum effect that would actually influence decisions.

Closing thoughts

Setting the right MDE is half art, half science. The math gives you the framework, but knowing your business context makes it actually useful. Start with what would meaningfully impact your metrics, work backwards to sample size requirements, and adjust until you find something realistic.

Next time you're planning an A/B test, spend an extra 10 minutes thinking through your MDE. It might save you weeks of waiting for inconclusive results. And if you're looking for more depth on power analysis and sample size calculations, the statistics community on Reddit has some great technical discussions, while Statsig's blog covers the practical implementation side.

Hope you find this useful! Now go forth and run better experiments.

Recent Posts

We use cookies to ensure you get the best experience on our website.
Privacy Policy