Multi-Armed Bandit

The term "Multi-Armed Bandit" comes from the field of probability theory and statistics, and it's often used in the context of reinforcement learning and A/B testing.

The name is derived from a hypothetical scenario involving a gambler at a row of slot machines (also known as "one-armed bandits"), who must decide which machines to play, how many times to play each machine and in which order to play them, to maximize his total reward.

In the context of online experiments, a multi-armed bandit algorithm dynamically adjusts the traffic allocation towards different variations based on their performance. This is different from traditional A/B testing where traffic is split evenly and the allocation does not change during the test.

How it Works

A multi-armed bandit algorithm works by continuously adjusting the traffic towards the best-performing variations until it can confidently pick the best variation. The winning variation will then receive 100% of the traffic.

For example, if a given variant has a 60% probability of being the best, a multi-armed bandit algorithm like Autotune will provide it 60% of the traffic. At a high level, the multi-armed bandit algorithm works by adding more users to a treatment as soon as it recognizes that it is clearly better in maximizing the reward (the target metric).

Throughout the process, higher-performing treatments are allocated more traffic whereas underperforming treatments are allocated less traffic. When the winning treatment beats the second-best treatment by enough margin, the process terminates.


Let's say you're running an online store and you want to test three different designs for your checkout page to see which one leads to the highest conversion rate.

Instead of splitting your traffic evenly between the three designs (as you would in a traditional A/B test), you could use a multi-armed bandit algorithm to dynamically adjust the traffic allocation based on the performance of each design.

If design A is performing well and has a 70% chance of being the best, the algorithm will allocate 70% of your traffic to design A. If design B has a 20% chance of being the best, it will receive 20% of the traffic, and design C, with a 10% chance of being the best, will receive the remaining 10% of the traffic.

This way, you're maximizing your conversions while the test is still running, instead of waiting until the end of the test to make changes based on the results.

When to Use Multi-Armed Bandit

Multi-armed bandit algorithms are particularly useful when you want to minimize the opportunity cost of showing sub-optimal variations to your users.

They're also useful when you want to run continuous experiments without a fixed end date, or when the performance of variations changes over time. However, they're not as useful when you want to measure the impact of a variation on multiple metrics, as they can only optimize for a single metric.

Join the #1 experimentation community

Connect with like-minded product leaders, data scientists, and engineers to share the latest in product experimentation.

Try Statsig Today

Get started for free. Add your whole team!

What builders love about us

OpenAI OpenAI
Brex Brex
Notion Notion
SoundCloud SoundCloud
Ancestry Ancestry
At OpenAI, we want to iterate as fast as possible. Statsig enables us to grow, scale, and learn efficiently. Integrating experimentation with product analytics and feature flagging has been crucial for quickly understanding and addressing our users' top priorities.
Dave Cummings
Engineering Manager, ChatGPT
Brex's mission is to help businesses move fast. Statsig is now helping our engineers move fast. It has been a game changer to automate the manual lift typical to running experiments and has helped product teams ship the right features to their users quickly.
Karandeep Anand
At Notion, we're continuously learning what our users value and want every team to run experiments to learn more. It’s also critical to maintain speed as a habit. Statsig's experimentation platform enables both this speed and learning for us.
Mengying Li
Data Science Manager
We evaluated Optimizely, LaunchDarkly, Split, and Eppo, but ultimately selected Statsig due to its comprehensive end-to-end integration. We wanted a complete solution rather than a partial one, including everything from the stats engine to data ingestion.
Don Browning
SVP, Data & Platform Engineering
We only had so many analysts. Statsig provided the necessary tools to remove the bottleneck. I know that we are able to impact our key business metrics in a positive way with Statsig. We are definitely heading in the right direction with Statsig.
Partha Sarathi
Director of Engineering
We use cookies to ensure you get the best experience on our website.
Privacy Policy