Configuration via Portal

Maitai allows you to change your LLM configurations (even in the middle of a conversation) without pushing code changes. You can modify your configuration in the Intent Overview page.

Passing parameters into your Chat Completion request will override what you set in the Portal.

We recommend the following code implementation for the most flexibility at run-time:

Parameters Available in Portal

Inference Location
Client or Server

Setting this to Client will enable provider calls client-side through the SDK, using your provider keys. Only available for OpenAI models.

Server-side inference will tell Maitai to make the request to the provider from our servers.

Context Retrieval
boolean
default: true

Enable Context Retrieval to only send the context you need to the LLM.

Evaluations
boolean
default: true

Enable Evaluations to monitor your LLM output for faults.

Apply Corrections
boolean
default: false
Corrections are only available for Server inference, and require Evaluations to be turned on.

Maitai generates corrections during the evaluation step. This configuration options tells us whether or not to apply those corrections to the Chat Completion response, or to stream corrected responses when streaming. This behavior is also referred to as “autocorrect”.

Safe Mode
boolean
default: false

Safe Mode ensures that corrections all LLM outputs are evaluated and corrected, with no consideration for latency.

Model
string
default: "gpt-4o"

ID of the model to use. Some models are only available for Server inference.

Fallback Model
string
default: ""

Optional. Model to be used if primary model fails or is experiencing an outage.

Temperature
number
default: 1

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Stop
string / array / null
default: "null"

Up to 4 sequences where the API will stop generating further tokens.

Log Probs
boolean
default: false

Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.

Frequency Penalty
number
default: 0

Defaults to 0. Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.

Presence Penalty
number
default: 0

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.

Max Tokens
integer or null

The maximum number of tokens that can be generated in the chat completion.

The total length of input tokens and generated tokens is limited by the model’s context length.

N
integer or null
default: 1

How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

Timeout
integer
default: -1

How long a request should run before it times out. If timeout <= 0, the request runs until it is complete.