Configure Models
Manage your LLM configurations through the Maitai Portal
Maitai allows you to change your LLM configurations (even in the middle of a conversation) without pushing code changes. You can modify your configuration in the Intent Overview page, or in the Chat Completion request.
We support all current models from OpenAI, Anthropic, Groq, Cerebras, and SambaNova.
For a full list of supported models with pricing, see the Billing page in the Portal.
Configuration via Portal
You can modify your configuration in the Intent Overview page.
We recommend the following code implementation for the most flexibility at run-time:
Parameters Available in Portal
Setting this to Client
will enable provider calls client-side through the SDK, using your provider keys. Only available for OpenAI models.
Server-side inference will tell Maitai to make the request to the provider from our servers.
Enable Evaluations to monitor your LLM output for faults.
Enable automatic corrections based on evaluations. When enabled, this will force Server inference.
Safe mode prioritizes accuracy over latency. Only available when Apply Corrections is enabled.
Primary AI model to be used for inference. Models are grouped by provider and sorted alphabetically, with Maitai models (recommended) appearing first.
Optional fallback model to use if the primary model is not available or has degraded performance.
Strategy to use when falling back to secondary model. Options:
reactive
: Falls back on primary model failuretimeout
: Falls back after specified timeout periodfirst_response
: Uses whichever model responds first
Timeout in seconds before falling back to secondary model. Only applicable when using the “timeout” fallback strategy.
Controls randomness in the model’s output. Range 0-2, where:
- Lower values (e.g., 0): More deterministic output
- Higher values (e.g., 2): More random output
Sequences where the API will stop generating further tokens.
Include the log probabilities of the output tokens.
How many completions to generate for each prompt.
The maximum number of tokens to generate in the completion.
Range -2 to 2. Increases the model’s likelihood to talk about new topics.
Range -2 to 2. Decreases the model’s likelihood to repeat the same line verbatim.
Was this page helpful?