Skip to main content
The Request Overview page includes an Evaluation section that lets you inspect existing evaluations and run additional one-off evaluations using your Sentinels. See: Request Overview

When this is useful

Evaluating a single request is great for:
  • Debugging a specific failure mode
  • Verifying whether a proposed Sentinel would have flagged a request
  • Comparing evaluation outcomes between the original request and a rerun

What you’ll see on the Request page

The Evaluation section shows one or more evaluation blocks. Each block includes:
  • The set of selected sentinels
  • A request-level summary (pass count / total)
  • Per-sentinel results (status, description, eval time)
Blocks can represent different sources of evaluation:
  • Real-time: evaluations associated with the original request
  • Batch Eval: results that came from an Evaluation Run (the block links back to the run)
  • Manual: one-off evaluations you run from the Request page

Run a manual (one-off) evaluation

  1. Open a request and scroll to the Evaluation section.
  2. Click Add Evaluation to create a new manual evaluation block.
  3. Select one or more sentinels to run.
  4. Click Run Evaluation.
You can only run an evaluation once the request has a final assistant message to evaluate. If you want to try a different sentinel set after running, add a new evaluation block.

Evaluating different versions (original vs reruns)

If you rerun a request (for example, with a different model or prompt), you can add an evaluation block while that version is selected to evaluate the rerun output with the same sentinels.

Suggested corrections

If a sentinel returns FAULT, the UI may show a Suggested Correction when the evaluation result includes one. Treat these as guidance for debugging and refinement, not as an automatic change to the request.