Interpreting Results

Once a Test Run is complete, the Portal gives you three layers of information:

Details: what configuration was used + timestamps
Results: macro metrics and score distribution
Requests: per-request scores, criteria breakdowns, and response comparisons

Run details (what was executed)

On the Test Run page, the Details section includes:

Date created / completed
Status (e.g. COMPLETED, ERROR, etc.)
Description
Configuration
- The UI lets you open a read-only config panel to inspect the run’s config (including the model).

Results (macro view)

The Results section includes:

# Requests: total requests in the run
Completed % / Error %: how many requests completed successfully vs errored
Response time percentiles
- The UI shows a headline percentile and exposes additional percentiles (e.g. p50/p90/p95) in a tooltip.

Score distribution (1–5)

The score distribution shows the run broken into five buckets:

1 — Poor
2 — Fair
3 — Good
4 — Great
5 — Perfect

You can click buckets to filter the requests table to those scores. Use this to quickly focus on the lowest-performing cases.

The Test Runs table on the Test Set page shows Pass Rate, computed as the percentage of completed requests with a score of 3 (Good) or better.

Requests table (debugging view)

The requests table is where you debug individual cases. You’ll see:

Status for each request execution
Tags associated with the underlying test request
Score
- Hover to see the criteria breakdown (criteria name + 1–5 star value + optional description)
Time (per-request response time, when available)
Actions
- View: open the original request for deeper inspection
- Responses: compare the baseline/original response against the test-run response side-by-side
- Compare: open a comparison modal for the same request across multiple runs

Comparing runs

Maitai supports two comparison workflows in the Portal:

Compare Runs (test-set level): select multiple completed runs from the Test Set page and open a table that shows request-by-request scores across runs. You can also open a “Responses” comparison to view multiple run outputs side-by-side.
Compare (single request across runs): from a Test Run’s requests table, open a modal that lists how that specific request performed across different runs.

Get Started

Observe

Test

Build

Examples

SDK Reference

Interpreting Results

Run details (what was executed)

Results (macro view)

Score distribution (1–5)

Requests table (debugging view)

Comparing runs

Get Started

Observe

Test

Build

Examples

SDK Reference

​Run details (what was executed)

​Results (macro view)

​Score distribution (1–5)

​Requests table (debugging view)

​Comparing runs

Run details (what was executed)

Results (macro view)

Score distribution (1–5)

Requests table (debugging view)

Comparing runs