We offer a range of serverless endpoints for popular open-source models.Documentation Index
Fetch the complete documentation index at: https://docs.gmicloud.ai/llms.txt
Use this file to discover all available pages before exploring further.
Access a Serverless inference model
Select a model from the list. Click on the model card:
Model Details
To access the serverless connection details for your model, click on “Descriptions”:
Playground
To access the Playground for your model, click on “Playground”:
| Option | Description |
|---|---|
| Temperature | Controls how much randomness you want in the generated text. A higher temperature produces more creative results, while a temperature of 0 yields deterministic, repeatable outputs — useful for testing and debugging. |
| Max Tokens | Defines the maximum number of tokens the model can generate (default: 4096). If the combined token count (prompt + output) exceeds the model’s context limit, the API automatically reduces the output to fit. |
| Top K | A sampling method that filters to the k most probable tokens, redistributing probability mass among them. This helps constrain randomness and focus generation on the most likely outputs. |
| Top P (Nucleus Sampling) | Instead of temperature sampling, Top-P considers only tokens whose cumulative probability ≤ top_p. For example, top_p = 0.1 limits generation to the top 10% of probability mass. |
| Frequency Penalty | Reduces repetition of words or phrases. A higher value discourages the model from reusing tokens already present in the output, helping maintain variety and avoid redundancy. |
| Presence Penalty | Encourages the introduction of new ideas or topics. A higher value pushes the model to generate novel concepts instead of reiterating existing ones. |
| Stream | Enables incremental, real-time output streaming — allowing responses to be processed and displayed as they are generated. |
| System Prompt | Provides a high-level instruction or context that guides the model’s tone, behavior, and responses throughout the interaction. |