Maitai allows you to change your LLM configurations (even in the middle of a conversation) without pushing code changes. You can modify your configuration in the Intent Overview page, or in the Chat Completion request.

We support all current models from OpenAI, Anthropic, Groq, Cerebras, and SambaNova.

For a full list of supported models with pricing, see the Billing page in the Portal.

Configuration via Portal

You can modify your configuration in the Intent Overview page.

Passing parameters into your Chat Completion request will override what you set in the Portal.

We recommend the following code implementation for the most flexibility at run-time:

Parameters Available in Portal

Inference Location
Client or Server

Setting this to Client will enable provider calls client-side through the SDK, using your provider keys. Only available for OpenAI models.

Server-side inference will tell Maitai to make the request to the provider from our servers.

Evaluations
boolean
default:
true

Enable Evaluations to monitor your LLM output for faults.

Apply Corrections
boolean
default:
false
Corrections are only available for Server inference, and require Evaluations to be turned on.

Enable automatic corrections based on evaluations. When enabled, this will force Server inference.

Safe Mode
boolean
default:
false

Safe mode prioritizes accuracy over latency. Only available when Apply Corrections is enabled.

Model
string

Primary AI model to be used for inference. Models are grouped by provider and sorted alphabetically, with Maitai models (recommended) appearing first.

Secondary Model
string

Optional fallback model to use if the primary model is not available or has degraded performance.

Fallback Strategy
string
default:
"reactive"

Strategy to use when falling back to secondary model. Options:

  • reactive: Falls back on primary model failure
  • timeout: Falls back after specified timeout period
  • first_response: Uses whichever model responds first
Fallback Timeout
number

Timeout in seconds before falling back to secondary model. Only applicable when using the “timeout” fallback strategy.

Temperature
number
default:
1

Controls randomness in the model’s output. Range 0-2, where:

  • Lower values (e.g., 0): More deterministic output
  • Higher values (e.g., 2): More random output
Stop
string / array

Sequences where the API will stop generating further tokens.

Log Probs
boolean
default:
false

Include the log probabilities of the output tokens.

N
integer
default:
1

How many completions to generate for each prompt.

Max Tokens
integer

The maximum number of tokens to generate in the completion.

Presence Penalty
number
default:
0

Range -2 to 2. Increases the model’s likelihood to talk about new topics.

Frequency Penalty
number
default:
0

Range -2 to 2. Decreases the model’s likelihood to repeat the same line verbatim.