| Temperature | Controls how much randomness you want in the generated text. A higher temperature produces more creative results, while a temperature of 0 yields deterministic, repeatable outputs — useful for testing and debugging. |
| Max Tokens | Defines the maximum number of tokens the model can generate (default: 4096). If the combined token count (prompt + output) exceeds the model’s context limit, the API automatically reduces the output to fit. |
| Top K | A sampling method that filters to the k most probable tokens, redistributing probability mass among them. This helps constrain randomness and focus generation on the most likely outputs. |
| Top P (Nucleus Sampling) | Instead of temperature sampling, Top-P considers only tokens whose cumulative probability ≤ top_p. For example, top_p = 0.1 limits generation to the top 10% of probability mass. |
| Frequency Penalty | Reduces repetition of words or phrases. A higher value discourages the model from reusing tokens already present in the output, helping maintain variety and avoid redundancy. |
| Presence Penalty | Encourages the introduction of new ideas or topics. A higher value pushes the model to generate novel concepts instead of reiterating existing ones. |
| Stream | Enables incremental, real-time output streaming — allowing responses to be processed and displayed as they are generated. |
| System Prompt | Provides a high-level instruction or context that guides the model’s tone, behavior, and responses throughout the interaction. |