Configure Models

Maitai allows you to change your LLM configurations (even in the middle of a conversation) without pushing code changes. You can modify your configuration in the Intent Overview page, or in the Chat Completion request. We support all current models from OpenAI, Anthropic, Groq, Cerebras, and SambaNova. For a full list of supported models with pricing, see the Billing page in the Portal.

Configuration via Portal

You can modify your configuration in the Intent Overview page.

Passing parameters into your Chat Completion request will override what you set in the Portal.

We recommend the following code implementation for the most flexibility at run-time:

import maitai

maitai_client = maitai.MaitaiAsync()

response = await maitai_client.chat.completions.create(
    # Just add messages, all model params come from the Portal config
    messages=messages,
    # These are Maitai required parameters
    session_id="SESSION_ID",
    intent="INTENT_TAG",
    application="YOUR_APPLICATION",
)

Parameters Available in Portal

Inference Location

Client or Server

Setting this to Client will enable provider calls client-side through the SDK, using your provider keys. Only available for OpenAI models.Server-side inference will tell Maitai to make the request to the provider from our servers.

Evaluations

boolean

default:true

Enable Evaluations to monitor your LLM output for faults.

Apply Corrections

boolean

default:false

Corrections are only available for Server inference, and require Evaluations to be turned on.

Enable automatic corrections based on evaluations. When enabled, this will force Server inference.

Safe Mode

boolean

default:false

Safe mode prioritizes accuracy over latency. Only available when Apply Corrections is enabled.

Model

string

Primary AI model to be used for inference. Models are grouped by provider and sorted alphabetically, with Maitai models (recommended) appearing first.

Secondary Model

string

Optional fallback model to use if the primary model is not available or has degraded performance.

Fallback Strategy

string

default:"reactive"

Strategy to use when falling back to secondary model. Options:

reactive: Falls back on primary model failure
- If a timeout is specified, it will initiate the fallback request after the timeout period
timeout: Falls back after specified timeout period
first_response: Uses whichever model responds first
- If a timeout is specified, it will only use the fallback model after the timeout period

Fallback Timeout

number

Timeout in seconds before falling back to secondary model. Only applicable when using the “timeout” fallback strategy.

Temperature

number

default:1

Controls randomness in the model’s output. Range 0-2, where:

Lower values (e.g., 0): More deterministic output
Higher values (e.g., 2): More random output

Stop

string / array

Sequences where the API will stop generating further tokens.

Log Probs

boolean

default:false

Include the log probabilities of the output tokens.

integer

default:1

How many completions to generate for each prompt.

Max Tokens

integer

The maximum number of tokens to generate in the completion.

Presence Penalty

number

default:0

Range -2 to 2. Increases the model’s likelihood to talk about new topics.

Frequency Penalty

number

default:0

Range -2 to 2. Decreases the model’s likelihood to repeat the same line verbatim.

Get Started

Examples

Features

SDK Reference

Configure Models

Configuration via Portal

Parameters Available in Portal

Get Started

Examples

Features

SDK Reference

​Configuration via Portal

​Parameters Available in Portal

Configuration via Portal

Parameters Available in Portal