5 features to 10x experiment velocity

Thu Sep 15 2022

Vineeth Madhusudanan

Product Manager, Statsig

Big tech techniques running experiments at industrial scale

Many companies want to 10x their experimentation velocity. Here are 5 techniques from sophisticated experimenters that help you do this—

  1. Feature Rollouts: auto-measure new feature impact with an a/b test

  2. Parameters: remove experiment variants in code to iterate faster

  3. Layers: remove hardcoded experiment references from code

  4. CUPED: use statistical techniques to get results faster

  5. Holdouts: measure cumulative impact/progress without grunt work

jeff bezos quote double your experiments to double your inventivenes

1. Feature Rollouts automatically turned into A/B tests

Most tooling is painful and error-prone. This makes teams spend countless hours and sweat on experimentation—limiting what gets tested. Companies that do this… understand the value of experimenting, but get a fraction of the value they should.

an iceberg diagram symbolizing product building

Many of the largest and most successful tech companies have figured out how to run experiments at an industrial scale. They make it easy for individual teams to measure the impact of each new feature on the company or organization's KPIs. This superpower brings data into the decision-making process, preventing endless debates and meetings.

Modern products ship new features behind a feature gate so they can control who sees features.

statsig console snippet

When there’s a partial rollout to a set of equivalent users, that is enough for Statsig to turn that into an A/B test. In this example, Statsig compares metrics for users Passing (10% rollout, Test) with those failing (90% not yet rolled out — Control).

rollout to users in statsig

The image below shows an example of a Pulse Report that shows a lift in metrics between Control and Test.

a statsig pulse report demonstrating a lift in metrics

Using Statsig feature gates to rollout new features removes the cognitive load of turning every rollout into an experiment—while still giving you observability into the impact the rollout has.

2. Parameters, not experiment variants in code

The legacy way to implement experiments is to have a bunch of if-then-else blocks in your code to handle each variant.

legacy experiment implementations

A more agile way to implement an experiment is to simply retrieve the button color from the experiment in Statsig.

modern experiments with statsig

You can now restart the experiment with a new set of colors to test, without touching shipped code. You can even increase the number of variants—test three colors instead of two—just by changing the config in Statsig.

When you’re working with mobile apps, the difference between the two approaches is night and day. You can rerun experiments even on older app versions without waiting for new code with a new if-then-else statement to hit the app stores. No more waiting for users to upgrade to the latest version of the app before you start to collect data!

experiment setup in the backend

The best in-house, next-gen experimentation systems use similar approaches. Read how Uber does something similar to unlock agility with their experimentation (Architecture section)

3. Layers

Experiment Parameters help you move faster. When you want to move even faster, hard-coded Experiment names become a bottleneck. What if you could ship another experiment without updating your code?

Layers enable this. Layers are typically used to run mutually exclusive experiments. They are also used to remove direct references to experiment names in code.

In the example below, elements on the app’s home screen are set up as parameters on the “Home Screen” layer—button_color, button_text and button_icon. The app simply retrieves parameters from this layer, without any awareness of experiments on the home screen.

If there are no experiments active in the layer, the default layer parameters apply. In the example below, there are three experiments active — with users split between them (mutual isolation). These experiments can control all or a subset of the layer parameters.

You can complete old experiments and start new experiments without touching the client app at all.

isolating experiments

4. CUPED

Controlled-experiment Using Pre-Experiment Data is a technique to reduce variance and bias in results. Think of it as noise-reduction - we look at noise in metrics before the experiment started to reduce noise in results.

statsig cuped applied

Looking across hundreds of customers—it reduces the sample sizes and durations for over half the key metrics measured in experiments. Learn more about our CUPED implementation. There are other statistical techniques including winsorization (limiting outlier values) that are also applied, but they typically don’t have as big an impact.

experimentation reduction chart

5. Automatic Holdouts

Team or product-level holdouts are powerful tools to measure the cumulative impact of features and experiments you’ve shipped over a holdout period (often ~6 months). You can tease apart the impact of external factors (e.g. your competitor going out of business) and seasonality (atypical events including holidays, unusual news cycles or weather) from the impact driven by your feature launches. You can also measure long-term effects and quantify subtle ecosystem changes.

Mature product teams use long-term holdouts. These can be expensive for engineers to set up—everyone creating a feature or an experiment needs to be aware of and respect this holdout.

On Statsig — creating a global holdout automatically applies them to new features gates and experiments. People creating them don’t have to do any manual work to check the Holdout.

Learn more about how the feature works and best practices around managing holdouts.

Questions?

This isn’t an exhaustive list. e.g. 6. Want to run hundreds of multi-armed bandits where you trust the system to pick a winner based on an optimization function? There’s Autotune. e.g. 7. Want to look at key metrics in near real-time? There’s Event Explorer. 8. Want to spin a quick new metric the same day, for a new feature you’re building? We’ve got you. 9. Reuse the data team-approved canonical metrics for your company from your warehouse? You can do that. 10. Want feature teams to self serve slicing data by OS, Country, Free vs Paid or another dimension you choose so they’re not blocked behind a data team crafting manual queries? Yes.

There are many more of these…

We created Statsig to close the experimentation gap between sophisticated experimenters and others. Feel free to reach out to talk about other ideas that accelerate experimentation!

the experimentation gap

Thanks to Tore

Build fast?

Subscribe to Scaling Down: Our newsletter on building at startup-speed.

Try Statsig Today

Get started for free. Add your whole team!
We use cookies to ensure you get the best experience on our website.
Privacy Policy