Evaluate

import maitai
from maitai import types as maitai_types

async def maitai_callback(eval_response: maitai_types.EvaluateResponse):
    # handle evaluate response
    pass

messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Generate numbers 1-10"},
]

maitai_client = maitai.MaitaiAsync()

response = await maitai_client.chat.completions.create(
    messages=messages,
    model="gpt-4o",
    session_id="YOUR_SESSION_ID",
    intent="NUMBER_GENERATOR",
    application="demo_app",
    callback=maitai_callback,
}

{
  "application_id": 16,
  "session_id": "xxx",
  "evaluation_results": [
    {
      "status": "FAULT",
      "description": "Test with random number",
      "confidence": 100.0,
      "correction": "Test with random number 42.",
      "sentinel_id": 41,
      "eval_time": 489,
      "date_created": 1716682520088,
      "usage": {
        "prompt_tokens": 380,
        "completion_tokens": 54,
        "total_tokens": 434
      }
    }
  ],
  "evaluation_request_id": "b84f0d55-7c7e-4af4-8d84-106385682250"
}

import maitai
from maitai import types as maitai_types

async def maitai_callback(eval_response: maitai_types.EvaluateResponse):
    # handle evaluate response
    pass

messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Generate numbers 1-10"},
]

maitai_client = maitai.MaitaiAsync()

response = await maitai_client.chat.completions.create(
    messages=messages,
    model="gpt-4o",
    session_id="YOUR_SESSION_ID",
    intent="NUMBER_GENERATOR",
    application="demo_app",
    callback=maitai_callback,
}

{
  "application_id": 16,
  "session_id": "xxx",
  "evaluation_results": [
    {
      "status": "FAULT",
      "description": "Test with random number",
      "confidence": 100.0,
      "correction": "Test with random number 42.",
      "sentinel_id": 41,
      "eval_time": 489,
      "date_created": 1716682520088,
      "usage": {
        "prompt_tokens": 380,
        "completion_tokens": 54,
        "total_tokens": 434
      }
    }
  ],
  "evaluation_request_id": "b84f0d55-7c7e-4af4-8d84-106385682250"
}

Create evaluations

Evaluations are automatically created by the Chat endpoint. The model output is what is evaluated, with the request used as context. If a callback is provided, the evaluation is passed to that function, and inference is not affected. If no callback is provided, and stream is true, then the evaluation is available on the last chunk. If no callback is provided and stream is false or none, then the evaluation can be found on the completion response.

Evaluate Response

import maitai
from maitai import types as maitai_types

async def maitai_callback(eval_response: maitai_types.EvaluateResponse):
    # handle evaluate response
    pass

messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Generate numbers 1-10"},
]

maitai_client = maitai.MaitaiAsync()

response = await maitai_client.chat.completions.create(
    messages=messages,
    model="gpt-4o",
    session_id="YOUR_SESSION_ID",
    intent="NUMBER_GENERATOR",
    application="demo_app",
    callback=maitai_callback,
}

{
  "application_id": 16,
  "session_id": "xxx",
  "evaluation_results": [
    {
      "status": "FAULT",
      "description": "Test with random number",
      "confidence": 100.0,
      "correction": "Test with random number 42.",
      "sentinel_id": 41,
      "eval_time": 489,
      "date_created": 1716682520088,
      "usage": {
        "prompt_tokens": 380,
        "completion_tokens": 54,
        "total_tokens": 434
      }
    }
  ],
  "evaluation_request_id": "b84f0d55-7c7e-4af4-8d84-106385682250"
}

application_id

integer

The Maitai identifier for the application.

session_id

string

A unique identifier for the session, passed in from the Chat endpoint.

request_id

integer

The identifier of the evaluated Chat Completion request.

evaluation_results

array

A list of individual Sentinel’s results

Show properties

integer

A unique identifier for the evaluation result.

status

string

The status of the evaluation. FAULT means a fault was detected. PASS means the LLM output passed testing. NA means an evaluation wasn’t performed.

description

string

A detailed description of the evaluation outcome.

confidence

float

The confidence level of the evaluation result.

Get Started

Examples

Features

SDK Reference

Create evaluations

Evaluate Response

Get Started

Examples

Features

SDK Reference

​Create evaluations

​Evaluate Response

Create evaluations

Evaluate Response