Frequently Asked Questions

A curated summary of the top questions asked on our Slack community, often relating to implementation, functionality, and building better products generally.
Statsig FAQs
Univision OpenAI Brex Microsoft Notion uiPath ea vercel Anthropic Flipkart OfferUp Vanta
Filter Applied:

Can a significant result in an A/A test be attributed to random chance?

In an A/A test, where both groups receive the same experience, you would generally expect to see no significant difference in metrics results. However, statistical noise can sometimes lead to significant results purely due to random chance. For example, if you're using a 95% confidence interval (5% significance level), you can expect to see one statistically significant metric out of twenty purely due to random chance. This number goes up if you start to include borderline metrics.

It's also important to note that the results can be influenced by factors such as within-week seasonality, novelty effects, or differences between early adopters and slower adopters. If you're seeing a significant result, it's crucial to interpret it in the context of your hypothesis and avoid cherry-picking results. If the result doesn't align with your hypothesis or doesn't have a plausible explanation, it could be a false positive.

If you're unsure, it might be helpful to run the experiment again to see if you get similar results. If the same pattern continues to appear, it might be worth investigating further.

In the early days of an experiment, the confidence intervals are so wide that these results can look extreme. There are two solutions to this:

1. Decisions should be made at the end of fixed-duration experiment. This ensures you get full experimental power on your metrics. Peeking at results on a daily basis is a known challenge with experimentation and it's strongly suggested that you take premature results with a grain of salt.   2. You can use Sequential testing. Sequential testing is a solution to the peeking problem. It will inflate the confidence intervals during the early stages of the experiment, which dramatically cuts down the false positive rates from peeking, while still providing a statistical framework for identifying notable results. More information on this feature can be found here.

It's important to keep in mind that experimentation is an imprecise science that's dealing with a lot of noise in the data. There's always a possibility of getting unexpected results by sheer random chance. If you're doing experiments strictly, you would make a decision based on the fixed-duration data. However, pragmatically, the newer data is always better (more data, more power) and it's okay to use as long as you're not cherry-picking and waiting for a borderline result to turn green.

Can a user be consistently placed in the same experiment group when transitioning from a free to a paid user?

Can I force run updating metrics on an ongoing experiment in Statsig?

When updating log events for a feature gate in Statsig, there is no need to restart the feature gate to see the revised metrics data, as changes should take effect immediately. However, if the updated metrics are not reflecting as expected, it is advisable to verify that the events are being logged correctly.

To obtain cleaner data for metrics lifts after modifying the code for a log event, you can adjust the distribution of users between the control and experiment groups or 'resalt' the feature to reshuffle users without changing the percentage distribution.

For experiments with delayed events, setting the experiment allocation to 0% after the desired period ensures that delayed events still count towards the analytics. It is important to note that the sizes of variants cannot be adjusted during an ongoing experiment to maintain the integrity of the results.

To increase the exposure of a variant, the current experiment must be stopped, and a new one with the desired percentage split should be started.

In the context of managing experiments with Terraform, the status field can be updated to reflect one of four possible values: setup, active, decision_made, and abandoned, aiding in the management of the experiment's lifecycle.

For those utilizing Statsig Warehouse Native, creating an Assignment Source and a new experiment allows for the definition of the experiment timeline and subsequent calculation of results.

Statsig pipelines typically run in PST, with data landing by 9am PST, although enterprise companies have the option to switch this to UTC.

Statsig Cloud calculates new or changed metrics going forward, and Statsig Warehouse Native offers the flexibility to create metrics after experiments have started or finished and to reanalyze data retrospectively.

Can Statsig analyze the impact of an experiment on CLV after a longer period, like 6 months?

In a subscription-based business model, understanding the long-term impact of experiments on Customer Lifetime Value (CLV) is crucial.

Statsig provides the capability to analyze the impact of an experiment on CLV over extended periods, such as 6 months. To facilitate this, Statsig allows for the setup of an experiment to run for a specific duration, such as 1 month, and then decrease the allocation to 0%, effectively stopping new user allocation while continuing to track the analytics for the users who were part of the experiment.

This tracking can continue for an additional 5 months or more, depending on the requirements. It is important to note that the experience delivered to users during the experiment will not continue after the allocation is set to 0%. However, there are strategies to address this, which can be discussed based on specific concerns or requirements.

Additionally, Statsig experiments by default expire after 90 days, but there is an option to extend the experiment duration multiple times for additional 30-day periods. Users will receive notifications or emails as the expiration date approaches, prompting them to extend the experiment if needed.

This functionality is available on the Pro plan, ensuring that businesses can effectively measure the long-term impact of their experiments on CLV without direct integration with a data warehouse and by updating CLV through integrations such as Hightouch.

How are p-values of experiments calculated and is it always assumed that the underlying distribution is a normal distribution?

In the context of hypothesis testing, the p-value is the probability of observing an effect equal to or larger than the measured metric delta, assuming that the null hypothesis is true. A p-value lower than a pre-defined threshold is considered evidence of a true effect.

The calculation of the p-value depends on the number of degrees of freedom (ν). For most experiments, a two-sample z-test is appropriate. However, for smaller experiments with ν < 100, Welch's t-test is used. In both cases, the p-value is dependent on the metric mean and variance computed for the test and control groups.

The z-statistic of a two-sample z-test is calculated using the formula: Z = (Xt - Xc) / sqrt(var(Xt) + var(Xc)). The two-sided p-value is then obtained from the standard normal cumulative distribution function.

For smaller sample sizes, Welch's t-test is the preferred statistical test due to its lower false positive rates in cases of unequal sizes and variances. The t-statistic is computed in the same way as the two-sample z-test, and the degrees of freedom ν are computed using a specific formula.

While the normal distribution is often used in these calculations due to the central limit theorem, the specific distribution used can depend on the nature of the experiment and the data. For instance, in Bayesian experiments, the posterior probability distribution is calculated, which can involve different distributions depending on the prior beliefs and the likelihood.

It's important to note that it's typically assumed that the sample means are normally distributed. This is generally true for most metrics thanks to the central limit theorem, even if the distribution of the metric values themselves is not normal.

How can we conduct QA for an experiment if another experiment is active on the same page with an identical layer ID?

To conduct Quality Assurance (QA) for your experiment while another experiment is active on the same page with an identical layer ID, you can use two methods:

1. Creating a New Layer: You can create a new layer for the new experiment. Layers allow you to run multiple landing page experiments without needing to update the code on the website for each experiment. When you run experiments as part of a layer, you should update the script to specify the layerid instead of expid. Here's an example of how to do this:

html <script src="[API_KEY]&layerid=[LAYER_NAME]"></script>

By creating a new layer for your new experiment, you can ensure that the two experiments do not interfere with each other. This way, you can conduct QA for your new experiment without affecting the currently active experiment.

2. Using Overrides: For pure QA, you can use overrides to get users into the experiences of your new experiment in that layer. Overrides take total precedence over what experiment a user would have been allocated to, what group the user would have received, or if the user would get no experiment experience because it is not started yet. You can override either individual user IDs or a larger group of users. The only caveat is a given userID will only be overridden into one experiment group per layer. For more information, refer to the Statsig Overrides Documentation.

When you actually want to run the experiment on real users, you will need to find some way to get allocation for it. This could involve concluding the other experiment or lowering its allocation.

How does Statsig differentiate data from different environments and can non-production data be used in experiments?

Statsig differentiates data from different environments by the environment tier you specify during the SDK initialization. You can set the environment tier to "staging", "development", or "production". By default, all checks and event logs are considered "production" data if the environment tier is unset.

Experiments and metrics primarily factor in production data. Non-production events are visible in diagnostics, but they are not included in Pulse results. This is because most companies do not want non-production test data being included. If you want to include these, you can log them as regular events. However, non-production data is filtered out of the warehouse and there is no other way to include it.

When initializing, if you add { environment: { tier: "production" } }, this would set your environment to "production", not "staging" or "development". If you want to set your environment to "staging" or "development", you should replace "production" with the desired environment tier.

Pulse results are only computed for “Production” tier events. To see Pulse results, you need: • An experiment that is “Started” (aka, enabled in production) • Exposures & events in production tier • Exposures for users in all test groups • Events/metrics associated to users in test groups

As long as you have initialized using { environment: { tier: "production" } }, your Pulse will compute. This means that even if your code is deployed to staging, as long as you initialize with the production tier, you will be able to see Pulse results.

How to ensure consistent experiment results across CSR and SSR pages with Statsig

Ensuring consistent experiment results across Client-Side Rendered (CSR) and Server-Side Rendered (SSR) pages when using Statsig involves establishing a single source of truth for user assignments. The challenge arises when independent network requests on different pages result in varying initialization times, potentially leading to different experiment variants being presented to the user.

To address this, it is recommended to have one consistent source of truth for the user's assignment. This can be achieved by using the assignments from the first page (CSR) and carrying them over to the second page (SSR), rather than initializing a new request on the second page.

This approach ensures that the user experiences a consistent variant throughout their session, regardless of the page they navigate to. It is important to note that the user's decision for any getExperiment call is determined at the time of initialization, not when getExperiment is called. The getExperiment call simply returns the value fetched during initialization and logs the user's participation in the experiment and their assigned group.

To implement this, developers should ensure that the assignments are fetched from the network request on the first page and then passed to the second page, avoiding the need for another initialization request and the management of an ever-growing list of parameters.

How to incorporate an app version check into Statsig experiment variants?

Incorporating an app version check into Statsig experiment variants can be achieved using Feature Gates or Segments. Here are the steps you need to follow:

1. Capture the Initial App Version: The first step is to capture and store the initial app version that a user starts using. This information is crucial as it will be used to determine whether a user is new and started using the app from a specific version onwards.

2. Use Custom Fields for Targeting: The next step is to use Custom Fields for targeting. This will require some code on the client-side that passes in user upgrade/create timestamps as a custom field.

3. Pass the Initial App Version as a Custom Field Key: You need to pass the value for users who started using the app as new users from a specific version as the Custom Field Key.

4. Configure the Custom Field Key: Once you create the key, configure it using the "Greater than or equal to version" operator. This operator checks the current version of the user's app.

For your specific case, you can create two separate experiments or feature gates. One for users on app version 1, where the variants are v1 and v2, and another for users on app version 2, where the variants are v1 to v4. You can then use the app version as a custom field for targeting.

Please note that the Key set in the Custom Field will be included in the events called by Feature Gates. However, it's not a chargeable event. It's just attributes that will be in the payload for events you're already tracking.

How to interpret pre-experiment results in experimentation data

When reviewing experimentation results, it is crucial to understand the significance of pre-experiment data. This data serves to highlight any potential pre-existing differences between the groups involved in the experiment. Such differences, if not accounted for, could lead to skewed results by attributing these inherent discrepancies to the experimental intervention.

To mitigate this issue, a technique known as CUPED (Controlled-Experiment Using Pre Experiment Data) is employed.

CUPED is instrumental in reducing variance and pre-exposure bias, thereby enhancing the accuracy of the experiment results. It is important to recognize, however, that CUPED has its limitations and cannot completely eliminate bias. Certain metrics, particularly those like retention, do not lend themselves well to CUPED adjustments.

In instances where bias is detected, users are promptly notified, and a warning is issued on the relevant Pulse results. The use of pre-experiment data is thus integral to the process of identifying and adjusting for pre-existing group differences, ensuring the integrity of the experimental outcomes.

How to optimize landing page loading without waiting for experiment configurations

When optimizing landing page loading without waiting for experiment configurations, it is recommended to use a custom script approach if you need to pass in specific user identifiers, such as Segment's anonymous ID.

This is because the standard landing page script provided by Statsig does not allow for the initialization with a different user ID. Instead, it automatically generates a stableID for the user, which is used for traffic splitting and consistent user experience upon revisits. This stableID is stored in a cookie and is used by Statsig to identify users in experiments.

However, if you need to synchronize with metrics from Segment using the Segment anonymous ID, you may need to deconstruct the landing page tool and implement a custom version that allows you to set the userObject with your Segment anonymous ID.

Additionally, you can enrich the Segment User with an additional identifier, such as statsigStableID, which can be obtained in JavaScript using statsig.getStableID(). This ID can then be mapped from the Segment event payload to Statsig's stableID. If performance is a concern, you can bootstrap your SDK with values so you don't have to wait for the network request to complete before rendering.

This can help mitigate performance issues related to waiting for experiment configurations. For more information on bootstrapping the SDK, you can refer to the official documentation on bootstrapping.

What is the recommended method for performing A/B tests for static pages in Next.js using Statsig?

There are two main methods for performing A/B tests for static pages in Next.js using Statsig.

The first method involves using getClientInitializeResponse and storing the initializeValues in a cookie. This approach is suitable if you want to avoid generating separate static pages for each variant. However, the cookie size is limited to 4KB, so this method might not be suitable if the initializeValues are large.

The second method involves generating a separate static page for each experiment's variant. This approach is suitable if you have a small number of variants and want to avoid the cookie size limitation. However, this method might require more setup and maintenance if you have a large number of variants.

If you're unsure which method to use, you can start with the method that seems easier to implement and switch to the other method if you encounter issues.

If you're concerned about the size of initializeValues, there are a couple of ways to bring down the response size. One way is to use target apps to limit the gates/experiments/etc included in the response. Another way is to add an option to getClientInitializeResponse to specify which gates/experiments/etc to include in the response.

If you plan on stitching together multiple cookies, a simple string splice might be easier. An alternative that doesn't involve stitching together multiple initializeValues is to use multiple client SDK instances. This wouldn't be supported in react, but using the js SDK, you could have multiple statsig instances each with its own set of configs. You would have to keep track of which instance to use for which experiment but this may be a "cleaner" approach.

The JS SDK can be synchronously loaded using initializeValues similarly to how the StatsigSynchronousProvider works. So you should be able to just call statsig.initialize(..., {initializeValues}) without needing to worry about awaiting.

Finally, you can also use the local evaluation SDK to fetch the whole config before the page becomes interactive and then pass it to the synchronous SDK. This is a client SDK, but it solves the "flickering" issues because you don't need to wait for the experiment(s) data to be fetched on the fly.

What is the status of the experiment if start is not clicked and what actions can I take?

When an experiment is in the "Unstarted" state, the code will revert to the 'default values' in the code. This refers to the parameter you pass to our get calls as documented here.

You have the option to enable an Experiment in lower environments such as staging or development, by toggling it on in those environments prior to starting it in Production. This allows you to test and adjust the experiment as needed before it goes live.

Remember, the status of the experiment is determined by whether the "Start" button has been clicked. If it hasn't, the experiment remains in the "Unstarted" state, allowing you to review and modify the experiment's configuration as needed.

Why are entries not showing up in the "Exposure Stream" tab of the experiment?

If you're not seeing data in the "Exposure Stream" tab, there could be a few reasons for this. Here are some steps you can take to troubleshoot:

1. Check Data Flow: Ensure that the id_type is set correctly and that your ids match the format of ids logged from SDKs. You can check this on the Metrics page of your project.

2. Query History: If your data is still not showing up in the console, check your query history for the user to understand which data is being pulled, and if queries are not executing or are failing.

3. Data Processing Time: If data is loading, it's likely we're just processing. For new metrics, give it a day to catch. If data isn't loaded after a day or two, please check in with us. The most common reason for metrics catalog failures is due to id_type mismatches.

4. Initialization Status: Each hook has an isLoading check, which you can use, or the StatsigProvider provides the StatsigContext which has initialization status as a field as well. This can be used to prevent a gate check unless you know the account has already been passed in and reinitialized.

If you've checked all these and are still experiencing issues, please let us know. We're here to help!

In some cases, the issue might be due to the way you are accessing the layer config and getting the value of the config. When you manually reach into the layer config and get the value of the config, we won’t know when to generate a layer exposure. For example, if you have a line of code like this:

const config = statsig?.getLayer('landing_page_gg')?.value;

While this will give you the value of the config, it won’t generate exposures and your experiment won’t work. In order to generate the correct exposures, you will need to get the Layer object and ask for the values using the “get” method. Like this:

const layer = statsig?.getLayer('landing_page_gg'); const block_1 = layer.get('block_1', '');

And similarly for other parameters you have defined in your layer.

If you are pulling multiple values at once and they control the same experience, it is recommended to pack them into a single parameter. However, if the parameters affect experiences in multiple locations, keep them separate. If you are pulling everything in one shot, treat them as a single object value. Use the json type. You could also just add a new parameter to the current experiment that has all the other values. There's no need to create a new experiment if you don’t want to.

Why is my experiment only showing overridden values and not running as expected?

If you're only seeing overridden values in the Exposure Stream for your experiment, there could be several reasons for this. Here are some steps you can take to troubleshoot:

1. Check the Initialization Status: Each hook has an isLoading check, which you can use, or the StatsigProvider provides the StatsigContext which has initialization status as a field as well. This can be used to prevent a gate check unless you know the account has already been passed in and reinitialized.

2. Check the Data Flow: Ensure that the id_type is set correctly and that your ids match the format of ids logged from SDKs. You can check this on the Metrics page of your project.

3. Check the Query History: If your data is still not showing up in the console, check your query history for the user to understand which data is being pulled, and if queries are not executing or are failing.

4. Check the Exposure Counts: If you're seeing lower than expected exposure counts, it could be due to initializing with a StatsigUser object that does not have the userID set, and then providing that on a subsequent render. This causes the SDK to refetch values, but logs an exposure for the “empty” userID first. To prevent this, ensure the userID is set before initializing the StatsigUser object.

If you've checked all these and the issue persists, it might be best to reach out for further assistance.In some cases, users may still qualify for the overrides based on the attributes you’re sending on the user object. This may be due to caching, or it may be due to the user qualifying for other segments that control overrides.

For example, if users have "first_utm_campaign": "34-kaiser-db" being sent on the user object, they would qualify for a segment that’s being used in the overrides.

It's also important to note that overridden users will see the assigned variant but will be excluded from experiment results. We have a way to include users in results for experiments not in layers, but it seems we don’t have that option for experiments in layers.

Lastly, consider why you are using overrides in this scenario instead of a targeting gate. Overrides can be used to test the Test variant in a staging environment before starting the experiment on prod. However, if some of your customers have opted out of being experimented on, a targeting gate might be a more suitable option.

Why is there a discrepancy between experiment allocation counts and server side pageview metric counts?

The discrepancy between the experiment allocation counts and the ssr_search_results_page_view (dau) counts could be due to several reasons:

1. **User Activity**: Not all users who are allocated to an experiment will trigger the ssr_search_results_page_view event. Some users might not reach the page that triggers this event, leading to a lower count for the event compared to the allocation.

2. **Event Logging**: There might be issues with the event logging. Ensure that the statsig.logEvent() function is being called correctly and that there are no errors preventing the event from being logged.

3. **Timing of Allocation and Event Logging**: If the event is logged before the user is allocated to the experiment, the event might not be associated with the experiment. Ensure that the allocation happens before the event logging.

4. **Multiple Page Views**: If a user visits the page multiple times in a day, they will be counted once in the allocation but multiple times in the ssr_search_results_page_view event.

If you've checked these potential issues and the discrepancy still exists, it might be a good idea to reach out to the Statsig team for further assistance.

Another possible reason for the discrepancy could be latency. If there is a significant delay between the experiment allocation and the event logging, users might abandon the page before the event is logged. This could lead to a lower count for the ssr_search_results_page_view event compared to the allocation.

Why is there cross-contamination in experiment groups and discrepancies in funnel vs summary data?

When conducting experiments, it is crucial to ensure that there is no cross-contamination between control and treatment groups and that data is accurately reflected in both funnel and summary views.

Cross-contamination can occur due to implementation issues, such as a race condition with tracking. This happens when users, particularly those with slower network connections, land on a control page and a page-view event is tracked before the redirect occurs.

To mitigate this, it is recommended to adjust the placement of tracking scripts. The Statsig redirect script should be positioned high in the head of the page, ensuring that it executes as early as possible. Meanwhile, page tracking calls should be made later in the page load lifecycle to reduce the likelihood of premature event tracking. This adjustment is expected to decrease discrepancies in tracking and improve the accuracy of experiment results.

Additionally, it is important to confirm that there are no other entry points to the control URL that could inadvertently affect the experiment's integrity. Ensuring that the experiment originates from the correct page and that redirects are functioning as intended is essential for maintaining the validity of the test.

Lastly, it is necessary to have specific calls in the code to track page views accurately. These measures will help ensure that the experiment data is reliable and that the funnel and summary views are consistent.

Join the #1 experimentation community

Connect with like-minded product leaders, data scientists, and engineers to share the latest in product experimentation.

Try Statsig Today

Get started for free. Add your whole team!

What builders love about us

OpenAI OpenAI
Brex Brex
Notion Notion
SoundCloud SoundCloud
Ancestry Ancestry
At OpenAI, we want to iterate as fast as possible. Statsig enables us to grow, scale, and learn efficiently. Integrating experimentation with product analytics and feature flagging has been crucial for quickly understanding and addressing our users' top priorities.
Dave Cummings
Engineering Manager, ChatGPT
Brex's mission is to help businesses move fast. Statsig is now helping our engineers move fast. It has been a game changer to automate the manual lift typical to running experiments and has helped product teams ship the right features to their users quickly.
Karandeep Anand
At Notion, we're continuously learning what our users value and want every team to run experiments to learn more. It’s also critical to maintain speed as a habit. Statsig's experimentation platform enables both this speed and learning for us.
Mengying Li
Data Science Manager
We evaluated Optimizely, LaunchDarkly, Split, and Eppo, but ultimately selected Statsig due to its comprehensive end-to-end integration. We wanted a complete solution rather than a partial one, including everything from the stats engine to data ingestion.
Don Browning
SVP, Data & Platform Engineering
We only had so many analysts. Statsig provided the necessary tools to remove the bottleneck. I know that we are able to impact our key business metrics in a positive way with Statsig. We are definitely heading in the right direction with Statsig.
Partha Sarathi
Director of Engineering
We use cookies to ensure you get the best experience on our website.
Privacy Policy