Products

Solutions

Resources

Docs Pricing

Products

Solutions

Resources

Products

Solutions

Resources

Using Pulse Time Series for Deeper Insights

Mon May 16 2022

The Statsig Pulse results page offers a snapshot of all the metric movements driven by an experiment.

Sometimes, a brief scan of the color coded score card is enough to validate that all metrics behave as expected, and we quickly proceed with the launch. Other times, however, a more detailed understanding is required before deciding on next steps.

Aggregated experiment metrics are easy to interpret, but obscure possible time-dependent effects

Time series charts can reveal insights otherwise hidden in fully aggregated results, such as seasonality and novelty effects. Different types of time series are available, and which one we use depends on the question we want to answer. Here we share an insider's guide this Pulse feature, how to use it, and why.

Day Since Exposure Time Series

This time series shows the metric impact broken down by the number of days a user has been in the experiment. It’s the best way to answer questions like:

Does my experiment have a novelty effect? Do users try out the new feature once and never again?
Is there pre-experiment bias in this metric? Was that lift there even before we launched the feature?

Day 0 is the day a user becomes part of the experiment, which is often the first time they see the new feature. Metric deltas that are significant early on and turn neutral with increasing tenure are indicative of a novelty effect: Users are engaging with a feature because it’s new, they’re curious. Once they try it out they lose interest and the impact is not sustained in the long run.

In the example below, moving a button to a more prominent location increased the number of clicks by 2,000%, but only on Day 0. After that, the effect is neutral. If we were hoping for a sustained lift, we should think twice before shipping this change.

statsig time series novelty effect example

Days since exposure time series highlights novelty effect

Pre-experiment Metrics

Setting Key Metrics for an experiment unlocks an additional benefit of the days since exposure chart. For this set of metrics, we also show the impact during the 7 days prior to a user joining the experiment. This is a convenient way to check whether there was a difference between the test and control groups even before the experiment started.

Imagine we’re dealing with a metric that shows a significant regression. Naturally, we wonder whether this is truly caused by our experiment, or perhaps we got unlucky in our group allocation. The chart below shows that the difference between test and control is neutral before the experiment starts, suddenly drops on Day 0, and remains negative on subsequent days. With this, we can rule out pre-experiment bias as the root cause.

This metric doesn’t have pre-experiment bias

Daily Time Series

This view shows the metric impact on each calendar day without aggregating days together. It’s a good one to check if we have concerns such as:

Does the feature have a different impact on weekends vs. weekdays?
Did yesterday’s server crash impact our experiment?

The daily time series also provides some insight into the variability of the effect day over day. When a metric has a statistically significant effect that we can’t explain, it reveals whether this effect is consistent, or primarily driven by one or two outlier days. In the latter scenario, we may choose to run the experiment for an additional week or investigate what happened on those days.

Below is an example of a metric that, unexpectedly, showed statistically significant lift. The daily time series shows that the metric is quite noisy and neutral on most days, but April 27 is a significant outlier. We take this lift with a grain of salt, knowing that it’s likely a false positive caused by random noise.

Holdouts

Another valuable use-case for daily time series is monitoring and evaluating holdouts, which are used to measure the impact of many features typically released over the course of several months.

Daily time series of a holdout shows the impact of two feature releases

Cumulative Time Series

While the daily time series often looks noisy and can have large confidence intervals, a cumulative view reveals how the aggregated metric lift and confidence intervals evolve over time as the experiment progresses. This comes in handy when wondering:

Do we expect confidence intervals to shrink if we run the experiment longer?

The behavior of confidence intervals over time depends on several factors: Influx of new users into the experiment, variance of the metric, sensitivity to user tenure, etc. The cumulative time series helps inform whether waiting longer could help gain higher confidence in the results.

The chart below shows how the confidence intervals for this metric are reduced by half during the first week of the experiment. It’s also evident that the both the effect and confidence intervals have been stable for the past few weeks, and we’re unlikely to gain new insights by running the experiment longer.

Avoiding the Time Series Rabbit Hole

alice in wonderland falling into a rabbit hole

Diving into time series, we may be concerned about information overload. The metric lifts in Pulse are straight forward to interpret, but slicing and dicing by days introduces gray areas and opens the door to p-hacking. Keep in mind that this tool exists to help check your assumptions, not to scavenge for impact or even to make every decision bullet proof.

In online experimentation we want to move fast without overlooking key data points that might lead us in a different direction. How deep we go in the analysis depends on the scope of the decision and how much weight we place on specific results. Pulse time series are readily available to ease the burden of these deep dives. Be sure to check them out as needed, keeping in mind some Do’s and Don’t’s.

Do:

Use days since exposure to check for novelty effects and pre-experiment bias.
Check for random daily noise that may significantly sway a result. Especially when looking at unexpected, unexplained metric movements.
Use the cumulative time series to gauge whether your experiment has stabilized.

Don’t:

Use the cumulative view to find the optimal end date to maximize impact or make a mostly negative guardrail neutral.
Deep dive only regressions you want to explain away, while accepting gains at face value. This introduces bias in your decisions.

How to Find Time Series in Pulse

Here’s how to get to the time series views in Pulse:

Go into metric details by hovering over a metric of interest or using the link at the top of the metrics section.
Click on the Time Series tab.
Use the drop-down to select the desired time series type.

Permalink: https://www.statsig.com/blog/using-pulse-time-series-for-deeper-insights

Products

Solutions

Resources

Products

Solutions

Resources

Docs

Pricing

Back to Blog home

Maggie Stewart

Using Pulse Time Series for Deeper Insights

The Statsig Pulse results page offers a snapshot of all the metric movements driven by an experiment.

Day Since Exposure Time Series

Daily Time Series

Cumulative Time Series

Avoiding the Time Series Rabbit Hole

How to Find Time Series in Pulse

Recent Posts

Sink, swim, or scale: What startups teach us about launching AI

Alexey Komissarouk, Yuzheng Sun, PhD

Optimizing cloud compute costs with GKE and compute classes

Pablo Beltran

How Statsig lets you ship, measure, and optimize AI-generated code

Sid Kumar, Brock Lumbard

Your users are your best benchmark: a guide to testing and optimizing AI products

Skye Scofield

The more the merrier? The problem of multiple comparisons in A/B Testing

Allon Korem, Oryah Lancry-Dayan

Randomization: The ABC’s of A/B Testing

Allon Korem, Oryah Lancry-Dayan