Skip to main content

Create evaluations

Evaluations are created when you call Chat with evaluations enabled (via Portal configuration or SDK parameters). The model output is what gets evaluated, with the request used as context. If a callback is provided, the evaluation is passed to that function asynchronously. If no callback is provided, and stream is true, then the evaluation is available on the last chunk. If no callback is provided and stream is false, then the evaluation can be found on the completion response.

Example (callback)

import maitai

async def on_eval(eval_response):
    # handle evaluate response (EvaluateResponse)
    print(eval_response)

messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Generate numbers 1-10"},
]

client = maitai.MaitaiAsync()

response = await client.chat.completions.create(
    messages=messages,
    model="gpt-4o",
    session_id="YOUR_SESSION_ID",
    intent="NUMBER_GENERATOR",
    application="demo_app",
    callback=on_eval,
)
{
  "application_id": 16,
  "session_id": "xxx",
  "evaluation_results": [
    {
      "status": "FAULT",
      "description": "Test with random number",
      "confidence": 100.0,
      "correction": "Test with random number 42.",
      "sentinel_id": 41,
      "eval_time": 489,
      "date_created": 1716682520088,
      "usage": {
        "prompt_tokens": 380,
        "completion_tokens": 54,
        "total_tokens": 434
      }
    }
  ],
  "evaluation_request_id": "b84f0d55-7c7e-4af4-8d84-106385682250"
}
application_id
integer
The Maitai identifier for the application.
session_id
string
A unique identifier for the session, passed in from the Chat endpoint.
request_id
string
The identifier of the evaluated chat completion request.
evaluation_results
array
A list of individual Sentinel results.
evaluation_request_id
string
The unique identifier for the evaluation request.