Once a Test Run is complete, the Portal gives you three layers of information:
- Details: what configuration was used + timestamps
- Results: macro metrics and score distribution
- Requests: per-request scores, criteria breakdowns, and response comparisons
Run details (what was executed)
On the Test Run page, the Details section includes:
- Date created / completed
- Status (e.g.
COMPLETED, ERROR, etc.)
- Description
- Configuration
- The UI lets you open a read-only config panel to inspect the run’s config (including the model).
Results (macro view)
The Results section includes:
- # Requests: total requests in the run
- Completed % / Error %: how many requests completed successfully vs errored
- Response time percentiles
- The UI shows a headline percentile and exposes additional percentiles (e.g. p50/p90/p95) in a tooltip.
Score distribution (1–5)
The score distribution shows the run broken into five buckets:
- 1 — Poor
- 2 — Fair
- 3 — Good
- 4 — Great
- 5 — Perfect
You can click buckets to filter the requests table to those scores. Use this to quickly focus on the lowest-performing cases.
The Test Runs table on the Test Set page shows Pass Rate, computed as the percentage of completed requests with a score of 3 (Good) or better.
Requests table (debugging view)
The requests table is where you debug individual cases. You’ll see:
- Status for each request execution
- Tags associated with the underlying test request
- Score
- Hover to see the criteria breakdown (criteria name + 1–5 star value + optional description)
- Time (per-request response time, when available)
- Actions
- View: open the original request for deeper inspection
- Responses: compare the baseline/original response against the test-run response side-by-side
- Compare: open a comparison modal for the same request across multiple runs
Comparing runs
Maitai supports two comparison workflows in the Portal:
- Compare Runs (test-set level): select multiple completed runs from the Test Set page and open a table that shows request-by-request scores across runs. You can also open a “Responses” comparison to view multiple run outputs side-by-side.
- Compare (single request across runs): from a Test Run’s requests table, open a modal that lists how that specific request performed across different runs.