Anna Yoon
Software Engineer, Statsig

Automating Safe AI Config Rollouts with Custom Benchmarks and Statsig

Tue Jan 06 2026

Motivation

As teams increasingly rely on dynamic configurations for AI systems — from model hyperparameters to prompt templates — controlling and validating these changes before rollout becomes critical.

At Statsig, we’ve heard from customers who wanted to automate this process:

“We’d like to run our custom benchmark tests automatically whenever a config changes, and block rollout if it fails.”

This is a perfect example of why we built Release Pipelines and recently expanded our Webhook + Console API capabilities. Together, they allow teams to integrate Statsig directly into their CI/CD and validation workflows — ensuring only safe, validated configurations reach production.

How It Works with Statsig

Here’s how you can automate benchmark validation with Statsig’s new webhook event:

“Release Pipeline Waiting for Review.”

Release Pipeline Waiting for Review Image

1. Configure a Release Pipeline

Start by creating a Release Pipeline in Statsig for your AI Config (for example, your prompt or model configuration).

Define your rollout phases — such as:

  • Phase 1:

    Dev (10%)

  • Phase 2:

    Staging (50%)

  • Phase 3:

    Production (100%)

You can require manual approval before advancing between phases — which we’ll automate next.

📘 Learn more about Release Pipelines → Release Pipeline Overview

2. Set Up a Webhook for Config Changes

Go to

Project Settings → Integrations → Webhook → Event Filtering

Under Configuration Changes → Action Types, enable:

“Release Pipeline Waiting For Review”

This webhook will fire whenever a rollout phase is awaiting approval.

The payload includes metadata like:

{ "event": "ReleasePipelineWaitingForReview", "releasePipelineMetadata": { "releasePipelineID": "rp_123", "phaseID": "phase_2", "gateID": "g_456", "triggerID": "t_789" } }

You can use this metadata to trigger your CI/CD workflow — for example, running a custom benchmark test suite on your updated model config.

3. Run Your Custom Benchmark

When your webhook fires, your CI/CD system (e.g., GitHub Actions, Jenkins, or an internal testing service) can automatically:

  1. Pull the latest config from Statsig.

  2. Run your internal benchmarks — such as prompt quality evaluation, latency checks, or regression testing.

  3. Evaluate results against your acceptance criteria.

If benchmarks pass, proceed to the next phase.

If benchmarks fail, block rollout or trigger a rollback.

4. Approve or Kill the Rollout Automatically

Statsig’s Console API (CAPI) lets you programmatically approve or halt rollout phases based on your benchmark results.

  • ✅ To advance rollout:

    Call the Approve Phase API endpoint using the releasePipelineMetadata payload.

  • ❌ To block rollout or rollback:

    Use the Kill Switch API to stop rollout for a specific region (e.g. SouthAmerica) or environment (e.g. Staging).

This closes the loop — enabling an end-to-end automated validation process governed by your own benchmark logic, but powered by Statsig’s feature and config rollout engine.

Example Flow

  1. You push a new prompt config to Statsig.

  2. Statsig triggers a webhook: “Release Pipeline Waiting for Review.”

  3. Your CI workflow starts automatically:

    • Runs benchmark tests.

    • Sends results back to Statsig via the Console API.

  4. If results pass → rollout advances to next phase.

    If results fail → rollout halts or rolls back.

The result: fully automated, test-gated configuration rollouts.

Example Config Change WorkFlow Image

Why This Matters

This integration allows you to:

  • Automate quality gates

    for AI and ML configurations.

  • Enforce CI validation

    before any rollout proceeds.

  • Protect production environments

    from bad configs or regressions.

  • Accelerate deployment velocity

    while maintaining control and trust.

By combining Statsig’s Release Pipelines, Webhooks, and Console API, you can treat configuration changes like code — continuously tested, validated, and safely deployed.

Looking Ahead

We’re continuing to expand automation and CI/CD integrations for config and feature rollouts.

Future enhancements include:

  • Direct integrations with popular CI tools (e.g., GitHub Actions, CircleCI)

  • More granular approval APIs (targeting specific phases)

  • Enhanced observability for automated approvals and test results

If you’re exploring automated gating for your AI Configs or CI/CD workflows, we’d love to hear from you — reach out in Statsig Community.

Try it yourself:

Enable Release Pipeline Waiting for Review webhooks in your Statsig project today, and see how easily you can add automated benchmark validation to your rollouts.



Please select at least one blog to continue.

Recent Posts

We use cookies to ensure you get the best experience on our website.
Privacy Policy