Skip to main content
This recipe connects the most common “day 2” Portal workflows:
  • find failures with Evaluations
  • inspect a Request
  • refine a Sentinel
  • promote the case into a Test Set
  • verify changes with a Test Run

1) Find a real failure

In the Portal, go to Test > Evaluations, then:
  • Add a filter for Status = FAULT
  • Optionally add Intent Group (to narrow to one area)
  • Expand a row to see which sentinel(s) faulted
  • Open the request via the Request action
See: Evaluation Results

2) Understand the failure on the Request page

On the Request page:
  • Review the request messages and response
  • Scroll to Evaluation to see the per-sentinel results for this request
  • If a failure includes a Suggested Correction, treat it as debugging guidance (not an automatic change)
See: Evaluating a Request

3) Promote the request into a Test Set

On the Request page, click Add to Test Set:
  • Choose an existing Test Set, or create one inline
  • Add tags (this flow enforces a max of 5 tags)
  • Submit
See: Test Set Creation

4) Correct the expected response (when production output was wrong)

If the original production response is wrong, update the Test Set’s expected response:
  1. Open the Test Set.
  2. Go to Requests.
  3. Use the request Edit action to update the final assistant message (and tool calls if applicable).
This makes the Test Set reflect “what should have happened”, which is what you want for regression testing.

5) Verify with a Test Run (baseline vs candidate)

From the Test Set:
  1. Go to Test Runs
  2. Click New Test Run
  3. Configure the candidate model/config and start the run
  4. When complete, use Compare Runs and per-request comparison to see what changed
See:

6) Fix the root cause with a Sentinel (when applicable)

If the failure is something you want to enforce at the Intent Group level:
  1. Open the Intent Group.
  2. Go to Sentinels.
  3. Edit the sentinel (or create a new one) and iterate until it matches your expectation.
See:

7) Close the loop

After updating Sentinels and/or configuration:
  • Re-check Test > Evaluations (filter by sentinel + FAULT) to confirm the fix is behaving as expected on real traffic.
  • Re-run the relevant Test Set to ensure you didn’t introduce regressions.