Skip to main content
Test Runs allow you to validate a configuration or model choice against a Test Set so you can track improvements (and regressions) over time. If you want the full reference docs for Test Runs, start here:

Creating a Test Run

  1. Navigate to your Test Set Overview page
  2. Click New Test Run in the upper right corner of the Test Runs table
New Test Run Button
  1. Configure your Test Run:
    • Add a descriptive name to easily identify this run later
    • Select the configuration that will be used for all requests in this Test Set (model + parameters)
Test Run Configuration
  1. Review your settings and click Create Test Run
  2. You’ll be redirected back to the Test Set Overview where your new Test Run will appear at the top of the table
Test Runs may take a few minutes to complete. While the status is PENDING, you won’t be able to access the Test Run details.

Understanding Test Run Results

Once your Test Run completes, you can analyze the results in detail:

Overview

The Test Set Overview page displays the Pass Rate for each Test Run - this represents the percentage of requests that scored “satisfactory” or higher (3/5 or better). Test Run Overview

Detailed Analysis

Click on any completed Test Run to see:
  1. Details Section
    • Configuration used for the Test Run
    • Description and execution timestamps
    • Overall performance metrics
  2. Results Section
    • Macro-level metrics across all requests
    • Score distribution visualization
    • Hover over distribution segments for detailed breakdowns
Test Run Results
  1. Requests Table
    • Individual scores for each request
    • Hover over scores to see evaluation criteria
    • Compare original vs Test Run responses side-by-side
Use the “Compare” feature to understand exactly how configuration changes affected specific responses.

Comparing runs (baseline vs candidate)

On a Test Set, you can select multiple runs and compare them side-by-side. This is the fastest way to answer:
  • “Did this change improve the overall pass rate?”
  • “Which specific requests got better or worse?”
Want to learn more about building Test Sets? Start with Test Set Creation.