Execute Test Runs
Monitor model performance improvements with Test Runs
Test Runs allow you to validate model configurations against your Test Sets to track improvements at the Intent level. While Maitai automatically executes Test Runs whenever a new model is fine-tuned, you can also manually create Test Runs to evaluate different models or configuration changes.
Creating a Test Run
- Navigate to your Test Set Overview page
- Click New Test Run in the upper right corner of the Test Runs table
- Configure your Test Run:
- Add a descriptive name to easily identify this run later
- Modify the configuration that will be used for all requests in this Test Set
- The default configuration matches what’s currently set for the Intent
- Review your settings and click Create Test Run
- You’ll be redirected back to the Test Set Overview where your new Test Run will appear at the top of the table
Test Runs may take a few minutes to complete. While the status is PENDING
, you won’t be able to access the Test Run details.
Understanding Test Run Results
Once your Test Run completes, you can analyze the results in detail:
Overview
The Test Set Overview page displays the Pass Rate for each Test Run - this represents the percentage of requests that scored “satisfactory” or higher (3/5 or better).
Detailed Analysis
Click on any completed Test Run to see:
-
Details Section
- Configuration used for the Test Run
- Description and execution timestamps
- Overall performance metrics
-
Results Section
- Macro-level metrics across all requests
- Score distribution visualization
- Hover over distribution segments for detailed breakdowns
- Requests Table
- Individual scores for each request
- Hover over scores to see evaluation criteria
- Compare original vs Test Run responses side-by-side
Use the “Compare” feature to understand exactly how configuration changes affected specific responses.
Automated Test Runs
Maitai automatically executes Test Runs in these scenarios:
- When a new model is fine-tuned for your Intent
- After significant model updates
- During A/B testing of configurations
This ensures continuous monitoring of your model’s performance and helps identify any regressions quickly.
Want to learn more about creating Test Sets? Check out our Test Sets guide.
Was this page helpful?