Ever stumbled upon a natural experiment hiding in your data? That's essentially what regression discontinuity design (RDD) is all about - finding those magical thresholds where you can actually measure cause and effect without running a traditional A/B test. Think scholarship cutoffs, age requirements, or population thresholds that determine policy eligibility.
The beauty of RDD is that it lets you play scientist with real-world data when controlled experiments aren't feasible. Maybe you can't randomly assign students to receive scholarships, but you can compare kids who scored 89.9 on a test with those who scored 90.1 when 90 is the cutoff.
RDD is basically causal inference for the opportunistic analyst. You're exploiting arbitrary cutoffs in data to figure out what actually causes what. According to discussions in the statistics community, the core insight is brilliant: people just above and below a threshold are basically identical except for whether they got the treatment.
This matters because randomized experiments aren't always possible. You can't ethically randomize who gets health insurance or which cities get funding. But if there's a cutoff - say, Medicaid eligibility at 138% of the poverty line - you've got yourself a natural experiment. The people at 137% and 139% of poverty are practically twins, except one gets coverage and one doesn't.
The approach gives you what statisticians call a local average treatment effect (LATE). It's "local" because you're only learning about people near the threshold, not everyone. That's both a limitation and a strength - your results are super credible for that specific group.
Take politics, for example. Eggers and colleagues looked at politicians who barely won versus barely lost elections. Same vote share practically, but vastly different outcomes. That's RDD gold. Other researchers have used it for everything from college enrollment (SAT score cutoffs) to criminal justice (sentencing guidelines based on offense scores).
The implementation can get technical - you'll choose between parametric and non-parametric methods, worry about bandwidth selection, and run robustness checks. But the core idea remains simple: find a threshold, compare people on either side, and you've got causation without randomization.
At the heart of RDD, you've got two key players: the running variable and the cutoff point. The running variable is whatever determines treatment - test scores, income, age, you name it. The cutoff is where the magic happens, creating that discontinuity that lets you estimate causal effects.
RDD comes in two flavors. Sharp RDD is the clean version - score above 90, you're in; score below, you're out. No exceptions. It's deterministic and straightforward. The relationship is crystal clear, which makes analysis simpler.
Fuzzy RDD is messier but more realistic. Maybe scoring above 90 makes you eligible for the scholarship, but not everyone takes it. Or perhaps there are appeals and exceptions. Research shows the treatment probability jumps at the cutoff but doesn't go from 0 to 100%. You're dealing with encouragement rather than mandate.
Picking your running variable matters more than you'd think. It needs to be continuous (no lumpy distributions) and something people can't precisely manipulate. Test scores work great; self-reported income, not so much. If people can game the system to land just above the cutoff, your whole design falls apart.
The absence of a traditional control group throws some people off. You're not comparing treatment and control groups like in an A/B test. Instead, you're comparing people just above and below the threshold. It's a different mindset, but when done right, it's just as valid for causal inference.
So you want to actually run an RDD analysis? Let's talk about making it work in practice. First things first: your data needs to cooperate. The assignment variable (what determines treatment) should be continuous without weird jumps or clusters. The statistics community emphasizes that your cutoff needs to be crystal clear - no fuzzy "around 90ish" thresholds.
The big assumption is continuity. Basically, if the treatment didn't exist, outcomes would change smoothly across the threshold. No sudden jumps for other reasons. Research has shown this assumption is testable but crucial.
When it comes to estimation, you've got choices:
Parametric approaches: Fit regression lines on each side of the cutoff. Simple, but you're assuming a specific functional form
Non-parametric methods: Use local linear regression or kernels. More flexible but requires more data near the cutoff
According to methodological research, the choice depends on your data and how much you trust your model specification. When in doubt, try both and see if results hold up.
Here's where the rubber meets the road - robustness checks. You can't just run one regression and call it a day. Smart analysts run several tests:
Start with density tests to check if there's bunching at the cutoff. If you see a suspicious spike in observations just above the threshold, someone's gaming the system. Next, run balance tests on pre-treatment characteristics. People on both sides of the cutoff should look similar on things that happened before treatment.
My favorite check? Placebo tests. Apply your RDD to outcomes that shouldn't be affected. If barely winning an election affects politician height, something's wrong with your design. These sanity checks separate credible RDD from wishful thinking.
Once you've mastered basic RDD, things get interesting. Multiple thresholds are where the real world gets messy. Maybe you've got different scholarship cutoffs for different states, or multiple population thresholds triggering different policies.
The naive approach - just normalize all cutoffs to zero and pool everything - misses important nuance. Different thresholds might have different effects. New estimators let you handle this heterogeneity properly, testing whether effects vary across cutoffs. It's the difference between "scholarships help" and "scholarships help more in states with higher cutoffs."
Then there's the optimization game. What if you could choose your thresholds? As experimentation platforms like Statsig show us, companies are getting smarter about threshold selection. Instead of arbitrary cutoffs, you can optimize for maximum impact.
Here's how threshold optimization typically works:
Run RDD at your current threshold
Estimate effects at alternative thresholds
Find the sweet spot balancing treatment effect size and treated population
Consider constraints (budget, fairness, implementation)
The key is avoiding interaction effects that could mislead your optimization. If the threshold effect varies by user segment, optimizing on the average might hurt specific groups.
Some folks are bringing Bayesian methods into RDD, though David Robinson warns these aren't magic bullets for the early stopping problem. The core challenge remains: you need enough data near the threshold for credible estimates, Bayesian or not.
The cutting edge? Machine learning for heterogeneous effects, geographic RDD using spatial boundaries, and dynamic RDD where thresholds change over time. As platforms like Statsig demonstrate, the gap between academic RDD and practical implementation is shrinking fast.
RDD sits in this sweet spot between observational studies and true experiments. When you can't randomize but need causal answers, those arbitrary thresholds in your data become your best friend. The technique has evolved from an econometric curiosity to a practical tool for data scientists and analysts.
The key is recognizing opportunities. Every time you see a hard cutoff determining who gets what, ask yourself: could this be an RDD? Just remember to check your assumptions, run those robustness tests, and be honest about what you're learning (effects at the threshold, not universal truths).
Want to dive deeper? Check out:
Imbens and Lemieux's canonical RDD guide for the mathematical foundations
The RDD packages in R and Python for hands-on practice
Real-world case studies from economics and policy journals
Hope you find this useful! Next time you're staring at a threshold in your data, you'll know exactly what to do with it.