Evaluating a Request

The Request Overview page includes an Evaluation section that lets you inspect existing evaluations and run additional one-off evaluations using your Sentinels. See: Request Overview

When this is useful

Evaluating a single request is great for:

Debugging a specific failure mode
Verifying whether a proposed Sentinel would have flagged a request
Comparing evaluation outcomes between the original request and a rerun

What you’ll see on the Request page

The Evaluation section shows one or more evaluation blocks. Each block includes:

The set of selected sentinels
A request-level summary (pass count / total)
Per-sentinel results (status, description, eval time)

Blocks can represent different sources of evaluation:

Real-time: evaluations associated with the original request
Batch Eval: results that came from an Evaluation Run (the block links back to the run)
Manual: one-off evaluations you run from the Request page

Run a manual (one-off) evaluation

Open a request and scroll to the Evaluation section.
Click Add Evaluation to create a new manual evaluation block.
Select one or more sentinels to run.
Click Run Evaluation.

You can only run an evaluation once the request has a final assistant message to evaluate. If you want to try a different sentinel set after running, add a new evaluation block.

Evaluating different versions (original vs reruns)

If you rerun a request (for example, with a different model or prompt), you can add an evaluation block while that version is selected to evaluate the rerun output with the same sentinels.

Suggested corrections

If a sentinel returns FAULT, the UI may show a Suggested Correction when the evaluation result includes one. Treat these as guidance for debugging and refinement, not as an automatic change to the request.

Get Started

Observe

Test

Build

Examples

SDK Reference

Evaluating a Request

When this is useful

What you’ll see on the Request page

Run a manual (one-off) evaluation

Evaluating different versions (original vs reruns)

Suggested corrections

Get Started

Observe

Test

Build

Examples

SDK Reference

​When this is useful

​What you’ll see on the Request page

​Run a manual (one-off) evaluation

​Evaluating different versions (original vs reruns)

​Suggested corrections

When this is useful

What you’ll see on the Request page

Run a manual (one-off) evaluation

Evaluating different versions (original vs reruns)

Suggested corrections