Test Runs allow you to validate model configurations against your Test Sets to track improvements at the Intent level. While Maitai automatically executes Test Runs whenever a new model is fine-tuned, you can also manually create Test Runs to evaluate different models or configuration changes.

Creating a Test Run

  1. Navigate to your Test Set Overview page
  2. Click New Test Run in the upper right corner of the Test Runs table

  1. Configure your Test Run:
    • Add a descriptive name to easily identify this run later
    • Modify the configuration that will be used for all requests in this Test Set
    • The default configuration matches what’s currently set for the Intent

  1. Review your settings and click Create Test Run
  2. You’ll be redirected back to the Test Set Overview where your new Test Run will appear at the top of the table

Test Runs may take a few minutes to complete. While the status is PENDING, you won’t be able to access the Test Run details.

Understanding Test Run Results

Once your Test Run completes, you can analyze the results in detail:

Overview

The Test Set Overview page displays the Pass Rate for each Test Run - this represents the percentage of requests that scored “satisfactory” or higher (3/5 or better).

Detailed Analysis

Click on any completed Test Run to see:

  1. Details Section

    • Configuration used for the Test Run
    • Description and execution timestamps
    • Overall performance metrics
  2. Results Section

    • Macro-level metrics across all requests
    • Score distribution visualization
    • Hover over distribution segments for detailed breakdowns

  1. Requests Table
    • Individual scores for each request
    • Hover over scores to see evaluation criteria
    • Compare original vs Test Run responses side-by-side

Use the “Compare” feature to understand exactly how configuration changes affected specific responses.

Automated Test Runs

Maitai automatically executes Test Runs in these scenarios:

  • When a new model is fine-tuned for your Intent
  • After significant model updates
  • During A/B testing of configurations

This ensures continuous monitoring of your model’s performance and helps identify any regressions quickly.

Want to learn more about creating Test Sets? Check out our Test Sets guide.