Picking Metrics 101

Platform

Developers

Resources

Pricing

Platform

Developers

Resources

OVERVIEW

ROLES

Build fast with Be Significant
Our exclusive startup program

GETTING STARTED

Documentation

Documentation to get you started with implementation

Walkthrough Guides

Guides to get you started with Statsig in no time

SDKs and APIs

Explore REST API and SDKs in more than 20 frameworks

Integrations

Learn more about connecting Statsig to your existing tools

A/B Testing Calculator

Looking for a place to start your A/B Test? Try out our calculator

How Statsig Works

Get under the hood and check out how Statsig scales with you

Open Source Code

All our SDKs and supporting libraries are Open Source and regularly updated

Product Updates

We ship fast, to help you ship faster. Check out all our product updates

System Status

Want to understand how reliable Statsig is as a service? Take a look at our system status

LEARN & CONNECT

Blog

Peak Velocity is our blog where we cover the latest in data

Support

Need help getting set up or have questions about our product?

Customer Stories

Find out how leading companies are using Statsig to grow

Events

Find out about our online and offline events and RSVP to them

Build vs Buy

Compare building an in-house platform vs. buying

Contact Sales

Want to connect with someone from the Sales team?

FEATURED BLOGS

Feature Flags Liberated

Gating features is a core part of the development process. And with Statsig, it's free.

How AI Companies Use Statsig

The best AI companies use Statsig to accelerate growth. Learn how you can do the same.

What is Product Observability?

Product observability means being able to monitor, control, and gain insight into all of your features.

Platform

Developers

Resources

Pricing

OVERVIEW

Statsig Blog

Peak Velocity is our blog where we cover the latest in experimentation and more

Feature Management

Ship faster and more confidently

Experimentation

Run 100s of randomized, multivariate experiments

Data Warehouse

Run experiments natively, in your warehouse

Analytics

Actionable intelligence at your fingertips

ROLES

Build fast with Be Significant
Our exclusive startup program

GETTING STARTED

Documentation

Documentation to get you started with implementation

Walkthrough Guides

Guides to get you started with Statsig in no time

SDKs and APIs

Explore REST API and SDKs in more than 20 frameworks

Integrations

Learn more about connecting Statsig to your existing tools

A/B Testing Calculator

Looking for a place to start your A/B Test? Try out our calculator

How Statsig Works

Get under the hood and check out how Statsig scales with you

Open Source Code

All our SDKs and supporting libraries are Open Source and regularly updated

Product Updates

We ship fast, to help you ship faster. Check out all our product updates

System Status

Want to understand how reliable Statsig is as a service? Take a look at our system status

LEARN & CONNECT

Blog

Peak Velocity is our blog where we cover the latest in data

Support

Need help getting set up or have questions about our product?

Customer Stories

Find out how leading companies are using Statsig to grow

Events

Find out about our online and offline events and RSVP to them

Build vs Buy

Compare building an in-house platform vs. buying

Contact Sales

Want to connect with someone from the Sales team?

FEATURED BLOGS

Experiments with Generative AI

We built a generative AI app in reactJS using OpenAI’s API and Statsig. Here’s what we learned:

Experimentation Platforms

The decision to build versus buy an experimentation and feature flagging platform is not an easy one.

CUPED Explained

CUPED is an implementation that uses pre-experiment data to explain the variance in the result data.

Picking Metrics 101

Fri Jun 03 2022

Today we’re going to walk through some thoughts on how to approach picking metrics for an experiment.

This isn’t exhaustive, but it should lay solid groundwork for setting yourself up for success with experimentation.

Let’s start with a short video, and discuss!

In the video above, we gave advice on how to approach picking the right metrics for an experiment (or conversely, how to avoid picking bad metrics!)

This will always require some thought, and understanding of the product changes you’re trying to achieve, but with practice it becomes second nature to have a strong, measurable hypothesis and a portfolio of secondary metrics to monitor.

Here’s my mental card when I get started:

Getting Started Picking Metrics Checklist

Let’s walk through this together!

Step 1: Stating a Hypothesis

Your primary evaluation should be on 1–2 metrics that tie directly into a specific hypothesis. Our suggested approach is to start by breaking what you’re testing into two pieces:

What is a direct, granular action that you think you will change?

This should be an immediate outcome of your change, and generally should be a necessary (but not sufficient) outcome for your experiment to succeed. We would call this a “mechanical” or “behavioral” metric. Answer the question “what is the first thing I would see if my experiment worked?

”Examples: clicks, 60-second page views, conversions.

What business outcome should this ladder up to?

Generally, you’re making a change in order to help drive a high-level business outcome; this metric represents that outcome and answers “why” you’re making this change. Sometimes this can’t be measured directly (“Satisfaction”) but can be measured through a proxy metric (“Retention”).

Pick an appropriate scope for this metric. A UI tweak in a menu likely won’t affect revenue enough to be observable, but you might try to measure a low-level strategic outcome like reducing “time to menu click”.

Why do we suggest two steps?

Breaking our hypothesis down this way allows us to clearly state it in terms of what we’re driving, and what we expect that to lead to.

I think by making the cart button bigger, we can drive more clicks on it. I think this might drive more revenue through more checkouts.

Your business outcome doesn’t need to follow immediately from the behavioral outcome, but you should be able to explain how you’d get from the behavioral metric to a change in the business metric. This is a good way to validate that your hypothesis makes sense in the first place.

This is also useful for framing your analysis. If you don’t see your business outcome move, you already have the first step for diagnosing this built-in:

If your mechanical metric moved, the relationship between it and the business outcome might be weaker than you thought, or there’s additional work you need to do to actualize the impact of that behavior
If your mechanical metric didn’t move, your change probably didn’t drive much of a change in user behavior — so you might instead focus your efforts on making your treatment effective

Step 2: Develop Secondary Metrics

Once you’ve done the above, congrats! You have a clear and measurable hypothesis. However, products are complex. It’s usually a mistake to see a positive result and assume the result is an unmitigated victory. You can proceed with much more confidence after thinking through what your secondary metrics should be.

Consider Potential Costs/Downsides

Consider the ways that your change could lead to a worse user experience. How could you try to measure if that’s happening? This is how you develop good counter-metrics, which you check to make sure you’re not accidentally causing regressions.

Common examples:

Making a feature more prominent reduces usage on other parts of the product (“cannibalization”)
Sending more notifications drives engagement, but leads to lower click-through rates and more users opting out of notifications. (You might even want to set up a holdout to track the long-term impact of notifications!)
Conversion rate goes up, but total conversions go down because less users enter the funnel — with ratio metrics, you will generally want to track the numerator and denominator in addition to the ratio itself

Sometimes, you can help yourself by picking a business metric that addresses these counter-metrics. Let’s say you’re a subscription service and want to increase conversion, but you’re worried that you’ll increase refunds as well. Tracking “Revenue Minus Refunds at Day 7”, instead of “Gross Revenue” means your business metric will be much more robust to movements in early refunds.

Consider Adding Explanatory Metrics

In addition to your mechanical metric above, you might want to add color by filling in the mathematical steps between “point A” and “point B”.

For example, if your business metric is “Revenue”, and your mechanical metric is “Checkouts”, you’d want to also track “Average Purchase Size in USD” to connect the formula ofRevenue = [Checkouts * $/Checkout]

Think Through Potential Learnings

Experiments can provide valuable information outside of the scope of your hypothesis. While you need to treat incidental observations with some skepticism due to the inherent noise in experimentation, it’s a good practice to use experiment results to form new hypotheses.

For example, let’s say you’ve heard that users in country A like dropdown menus and users in country B like type-aheads. However, you’re not sure if this is true, or causal.

If you happen to have run an experiment that switches a button to a typeahead for all users, you can split the results by country to get some causal signal on if users in those countries truly have different preferences. It’s not your main hypothesis, but it’s an important learning! Looking for these free insights is a great way to get more value out of your data.

Sanity Check:

Once you’ve done this, You should be able to state the results of this exercise in plain language:

Lastly — Some Dos and Don’ts!

Have one clear behavioral metric you’re aiming to change, and an appropriately scoped business metric it should ladder up to.
Think about “what I’ll think if I see this metric move”. Argue with yourself before you see any data — it’ll help you avoid pitfalls!
Consider negative consequences your changes might cause, and proactively measure them
Try to have background/context metrics that can help generate hypothesis around what drove your primary metric’s movements (or lack of movement)
Consider what you can learn from this experiment for follow-ups, and make that a clear part of your proposed analysis
Use a power calculator to make sure you’re getting enough power and to know when to check your results

Don’t

Blindly stick to the same business metric. Consider what your change should drive, and what an appropriate target is.
Generate too many behavioral metrics to check. This can make your analysis confusing and lead to false positives in either direction. In the example above, we’d want to pick just one of “clicks” or “checkouts” as our hypothesized behavioral metric.
Over-index on the results of metrics which aren’t your primary hypothesis, especially if you’re looking at subpopulations. With experimental error, it’s really easy to find accidental “wins” or “losses” if you slice things up too much. Use learnings/explanatory metrics for
directional, inconclusive
insight on where to look next

Featured

Statsig for startups

Statsig offers a generous program for early-stage startups who are scaling fast and need a sophisticated experimentation platform.

Stay ahead of the curve

Get experimentation insights in your inbox!

Permalink: https://www.statsig.com/blog/picking-metrics-101

Try Statsig Today

Get started for free. Add your whole team!

Try for Free

Platform

Developers

Resources

Statsig Blog

See All Features

Feature Management

Experimentation

Data Warehouse

Analytics

Engineering

Dev Ops

Data Science

Product Management

Artificial Intelligence

Gaming

B2B Saas

E-Commerce

Build fast with Be Significant Our exclusive startup program

Documentation

Walkthrough Guides

SDKs and APIs

Integrations

A/B Testing Calculator

How Statsig Works

Open Source Code

Product Updates

System Status

Blog

Support

Customer Stories

Events

Build vs Buy

Contact Sales

Feature Flags Liberated

How AI Companies Use Statsig

What is Product Observability?

Platform

Developers

Resources

Pricing

Statsig Blog

See All Features

Feature Management

Experimentation

Data Warehouse

Analytics

Engineering

Dev Ops

Data Science

Product Management

Artificial Intelligence

Gaming

B2B Saas

E-Commerce

Build fast with Be Significant Our exclusive startup program

Documentation

Walkthrough Guides

SDKs and APIs

Integrations

A/B Testing Calculator

How Statsig Works

Open Source Code

Product Updates

System Status

Blog

Support

Customer Stories

Events

Build vs Buy

Contact Sales

Experiments with Generative AI

Experimentation Platforms

CUPED Explained

Back to blog home

Picking Metrics 101

Craig Sexauer

Today we’re going to walk through some thoughts on how to approach picking metrics for an experiment.

Step 1: Stating a Hypothesis

What is a direct, granular action that you think you will change?

What business outcome should this ladder up to?

Build fast with Be Significant
Our exclusive startup program

Build fast with Be Significant
Our exclusive startup program