Skip to main content
Some teams already run inference with their own LLM client (OpenAI SDK, Groq SDK, etc.) and want Maitai to index + store the traffic for observability, debugging, and building Test Sets — without running inference through Maitai’s inference service. Maitai supports this via a storage endpoint:
  • PUT /chat/completions/response
This endpoint stores a request/response pair as PROD traffic and marks the inference location as CLIENT server-side. If you’re okay with Maitai wrapping your provider client, you can run inference locally and have the SDK automatically store the request/response in Maitai. Key flags:
  • server_side_inference = false (don’t route inference through Maitai)
  • evaluation_enabled = false (store-only; no evaluation)
import maitai

client = maitai.Maitai()  # uses MAITAI_API_KEY + provider keys (ex: OPENAI_API_KEY) from env

response = client.chat.completions.create(
    messages=[
        {"role": "user", "content": "Summarize this text..."},
    ],
    application="demo_app",
    intent="SUMMARIZATION",
    session_id="YOUR_SESSION_ID",
    model="gpt-4o",
    server_side_inference=False,
    evaluation_enabled=False,
    metadata={"source": "client_inference"},
)

Option B: Bring your own client, and upload request+response for storage only

If you already have an LLM response from your own client, you can upload the request/response pair directly to Maitai for indexing.

Required fields

  • chat_completion_request.application_ref_name
  • chat_completion_request.action_type (this is the “Intent” / “ApplicationAction”)
  • chat_completion_request.session_id (strongly recommended so the traffic threads into Sessions)
  • chat_completion_request.params (the model inputs; at minimum include messages and model)
  • chat_completion_response (OpenAI-style chat.completion response shape)

cURL example

curl --request PUT \
  --url https://api.trymaitai.ai/chat/completions/response \
  --header "Content-Type: application/json" \
  --header "x-api-key: $MAITAI_API_KEY" \
  --data '{
    "chat_completion_request": {
      "application_ref_name": "demo_app",
      "session_id": "YOUR_SESSION_ID",
      "action_type": "SUMMARIZATION",
      "params": {
        "model": "gpt-4o",
        "messages": [
          { "role": "user", "content": "Summarize this text..." }
        ]
      },
      "metadata": { "source": "byo_client" }
    },
    "chat_completion_response": {
      "id": "chatcmpl_example",
      "object": "chat.completion",
      "created": 1730000000,
      "model": "gpt-4o",
      "choices": [
        {
          "index": 0,
          "message": { "role": "assistant", "content": "Here is the summary..." },
          "finish_reason": "stop",
          "is_correction": false
        }
      ],
      "correction_applied": false,
      "first_token_time": 0,
      "response_time": 0
    }
  }'

Notes

  • This endpoint is storage/indexing only. It does not run inference, evaluations, or corrections.
  • Maitai stores these as PROD traffic and marks inference location as CLIENT server-side.