Configuration
Manage your LLM configurations through the Maitai Portal
Configuration via Portal
Maitai allows you to change your LLM configurations (even in the middle of a conversation) without pushing code changes. You can modify your configuration in the Intent Overview page.
We recommend the following code implementation for the most flexibility at run-time:
Parameters Available in Portal
Setting this to Client
will enable provider calls client-side through the SDK, using your provider keys. Only available for OpenAI models.
Server-side inference will tell Maitai to make the request to the provider from our servers.
Enable Context Retrieval to only send the context you need to the LLM.
Enable Evaluations to monitor your LLM output for faults.
Maitai generates corrections during the evaluation step. This configuration options tells us whether or not to apply those corrections to the Chat Completion response, or to stream corrected responses when streaming. This behavior is also referred to as “autocorrect”.
Safe Mode ensures that corrections all LLM outputs are evaluated and corrected, with no consideration for latency.
ID of the model to use. Some models are only available for Server inference.
Optional. Model to be used if primary model fails or is experiencing an outage.
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
Up to 4 sequences where the API will stop generating further tokens.
Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each
output token returned in the content
of message
.
Defaults to 0. Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.
The maximum number of tokens that can be generated in the chat completion.
The total length of input tokens and generated tokens is limited by the model’s context length.
How many chat completion choices to generate for each input message. Note that you will be charged based on the
number of generated tokens across all of the choices. Keep n
as 1
to minimize costs.
How long a request should run before it times out. If timeout <= 0
, the request runs until it is complete.
Was this page helpful?