We're adding the ability to quickly copy metrics used in one experiment to a new experiment.
It's easy to select metrics for an experiment in Statsig, and tags or templates are powerful tools to manage collections. Sometimes, though, you already set up the perfect measurement suite on another experiment and just want to copy that - and now you can!
This is especially powerful for customers using local metrics - experiment-scoped custom metric definitions - since you can copy those between experiments without needing to add them to your permanent metric catalog.
Generally, experimentalists make decisions by comparing means and using standard deviation to assess spread. There's exceptions, like using percentile metrics, but the vast majority of comparisons are done in this way.
It's effective, but it's also well known that means mask a lot of information. To help experimentalists on Statsig understand what's going on behind the scenes, we're adding an easy interface to dig into the distributions behind results.
Here, we can see a pulse result showing a statistically significant lift in revenue for both of our experimental variants.
By opening the histogram view (found in the statistics details), we can easily see that this lift is mostly driven by more users moving from the lowest-spend bucket into higher buckets
This is available today on Warehouse Native - and we're scoping out Statsig Cloud.
We're providing more control over when experiment results load in Warehouse Native - in addition to schedules and API-based triggers, customers can also specify days of the week to load results on for a given experiment or as an organizational default.
In addition, org-level presets for turbo mode and other load settings will help people keep their warehouse bill and load times slim! Read more at https://docs.statsig.com/statsig-warehouse-native/guides/costs
Many customers on Statsig run hundreds of concurrent experiments. On Warehouse Native, this means that interactive queries from the Statsig console can run slowly during peak hours for daily compute
Now, users on Snowflake, Databricks, and Bigquery can specify separate compute resources for 'interactive' console queries vs. scheduled 'job' queries - meaning the interactive queries will always be snappy. This also means a large compute resource used for large-scale experiment analysis won't get spun up when running small interactive queries like loading data samples.
For those warehouses, we've also added the ability to specify different service accounts for different "Statsig Roles" within the Statsig roles system. This means that the scorecard service account has the necessary access to user data for calculating experiment results, but customers can specify privacy rules like masking for sensitive fields to prevent access to sensitive data through interactive queries in the Statsig console.
Three exciting new improvements to our recently launched Topline Alerts product-
Embed variables in your alert message- You can now insert the event name, value of the alert, warn threshold, alert threshold, and (soon) the value you've grouped your events by directly into your notification text body to provide more context when viewing alert notifications.
Test your notification manually- You can now trigger each state of your alert (Raise, Warn, Resolve, No Data) to ensure your alert is configured as desired at setup time.
View Evaluated vs. Source data- In the "Diagnostics" tab of your alert, you can now toggle between Evaluated (aggregated data for the end alert evaluation) and Source modes (underlying event data pre-aggregation, used as input to your alert calculation). While Evaluated data mode is still restricted to a 24 hr event window, you can look back further for Source data to get a sense of how the event you're alerting on has trended over a longer window.
Surrogate metrics are now available as a type of "latest value" metric in Warehouse Native.
Surrogate metrics (also called proxy or predictive metrics) enable measurement of a long-term outcome that can be impractical to measure during an experiment. However, if used incorrectly adjustment, the false-positive rate will be inflated.
While Statsig won't create surrogate metrics for you, when you've created one you can input the mean squared error (MSE) of your model, so that we can accurately calculate p-values and confidence intervals that account for the inherent error in the predictive model.
Surrogate metrics will have inflated confidence intervals and p-values compared to the same metric without any MSE specified.
Learn more here!
You can now compare up to 15 groups in funnel charts when using a group by, up from the previous limit of 5.
Select and compare up to 15 groups in a funnel analysis
Use a new selector to control exactly how many groups to display
Once you apply a group by (e.g., browser, country, experiment variant), a group count selector appears. Use it to choose how many top groups to include based on event volume.
This gives you more flexibility to analyze performance across more segments—especially helpful for large experiments, multi-region launches, or platform-specific funnels.
Let us know how this works for your use case—we’re always looking to improve.
You can now filter or break down any Product Analytics chart by holdout group, making it easier to measure the combined impact of multiple features.
Filter any chart to only include users in a specific holdout group
Break down metrics by holdout status to compare behavior between held-out users and exposed users
Holdouts are used to evaluate the aggregate effect of multiple features—not just individual experiments. A holdout group is a set of users who are intentionally excluded from a group of features or experiments to serve as a baseline. Now, you can use that same grouping to filter or break down any Product Analytics chart.
To apply, use the filter or group-by menu on a chart and select the relevant holdout.
Holdout analysis helps you answer questions like:
What’s the total impact of all features launched in the last quarter?
Are users in the holdout group retaining or converting differently than exposed users?
It gives you a high-level view of product changes—beyond individual experiments—using the same familiar Product Analytics workflows.
A new dedicated chart settings panel gives you more control over how charts are displayed—making it easier to fine tine your analysis data and how that data is visualized.
From the gear icon in the top-right of any chart, you can now:
Start Y-Axis at 0 for more consistent visual baselines
Filter Out Bots to clean up automated or test traffic
Include Non-Prod Data when needed for QA or staging checks
Show Table/Legend Only to highlight key values without showing the full plot
Split Charts by Metric (in Metric Drilldown only) to display each metric on its own chart—ideal for comparing metrics with different units or scales
Click the gear icon to open the chart options panel. These settings are chart-specific and persist as part of the chart configuration. When using Drilldown, splitting by metric creates a stacked view—turning one chart into a mini dashboard.
These controls help tailor each chart to its purpose—whether you’re cleaning up noisy data, presenting key takeaways, or exploring metrics with vastly different scales.
You can now measure how frequently users (or other unit IDs) perform a specific event with the new Count per Useraggregation option.
Analyze the average, median, or percentile distribution of how often an event is performed per user (or per company, account, etc.)
Select from: average (default), median, min, max, 75th, 90th, 95th, or 99th percentile
Choose the unit ID to aggregate on—user ID, company ID, or any custom unique identifier
When you select Count per User in Metric Drilldown charts, Statsig calculates how many times each unit ID performed the chosen event during the time window. You can then apply summary statistics like median or 95th percentile to understand the distribution across those users.
This aggregation only includes unit IDs that performed the event at least once in the time range—it doesn’t factor in users who did not perform the event.
This gives you a more nuanced view of engagement patterns, helping you answer questions like:
What’s the median number of times a user triggers a key action?
How often do your most active users complete a workflow?
How concentrated or spread out is usage of a particular feature?
Ideal for understanding usage depth, not just reach.