- find failures with Evaluations
- inspect a Request
- refine a Sentinel
- promote the case into a Test Set
- verify changes with a Test Run
1) Find a real failure
In the Portal, go toTest > Evaluations, then:
- Add a filter for Status = FAULT
- Optionally add Intent Group (to narrow to one area)
- Expand a row to see which sentinel(s) faulted
- Open the request via the Request action
2) Understand the failure on the Request page
On the Request page:- Review the request messages and response
- Scroll to Evaluation to see the per-sentinel results for this request
- If a failure includes a Suggested Correction, treat it as debugging guidance (not an automatic change)
3) Promote the request into a Test Set
On the Request page, click Add to Test Set:- Choose an existing Test Set, or create one inline
- Add tags (this flow enforces a max of 5 tags)
- Submit
4) Correct the expected response (when production output was wrong)
If the original production response is wrong, update the Test Set’s expected response:- Open the Test Set.
- Go to Requests.
- Use the request Edit action to update the final assistant message (and tool calls if applicable).
5) Verify with a Test Run (baseline vs candidate)
From the Test Set:- Go to Test Runs
- Click New Test Run
- Configure the candidate model/config and start the run
- When complete, use Compare Runs and per-request comparison to see what changed
6) Fix the root cause with a Sentinel (when applicable)
If the failure is something you want to enforce at the Intent Group level:- Open the Intent Group.
- Go to Sentinels.
- Edit the sentinel (or create a new one) and iterate until it matches your expectation.
7) Close the loop
After updating Sentinels and/or configuration:- Re-check
Test > Evaluations(filter by sentinel +FAULT) to confirm the fix is behaving as expected on real traffic. - Re-run the relevant Test Set to ensure you didn’t introduce regressions.