> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gmicloud.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Serverless Endpoint

We offer a range of serverless endpoints for popular open-source models.

## Access a Serverless inference model

Select a model from the list.

Click on the model card:

<img src="https://mintcdn.com/gmicloud/eGTC6bV7G8ZZ053N/assets/serverless-select-model-card.png?fit=max&auto=format&n=eGTC6bV7G8ZZ053N&q=85&s=a15ed429f95c51d1c8c5052639f7c9f4" alt="Serverless button" width="2499" height="1819" data-path="assets/serverless-select-model-card.png" />

### Model Details

To access the serverless connection details for your model, click on *"Descriptions"*:

<img src="https://mintcdn.com/gmicloud/eGTC6bV7G8ZZ053N/inference-engine/marketplace/serverless-description.png?fit=max&auto=format&n=eGTC6bV7G8ZZ053N&q=85&s=8802325102b086140b74647a625e3a78" alt="Model details" width="2484" height="1875" data-path="inference-engine/marketplace/serverless-description.png" />

### Playground

To access the Playground for your model, click on *"Playground"*:

<img src="https://mintcdn.com/gmicloud/eGTC6bV7G8ZZ053N/inference-engine/marketplace/serverless-playground.png?fit=max&auto=format&n=eGTC6bV7G8ZZ053N&q=85&s=635325431ab8406d7fd06d28f84d2c5d" alt="Playground" width="2484" height="1875" data-path="inference-engine/marketplace/serverless-playground.png" />

| **Option**                   | **Description**                                                                                                                                                                                                               |
| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Temperature**              | Controls how much randomness you want in the generated text. A higher temperature produces more *creative* results, while a temperature of **0** yields deterministic, repeatable outputs — useful for testing and debugging. |
| **Max Tokens**               | Defines the maximum number of tokens the model can generate (default: **4096**). If the combined token count (prompt + output) exceeds the model’s context limit, the API automatically reduces the output to fit.            |
| **Top K**                    | A sampling method that filters to the **k most probable tokens**, redistributing probability mass among them. This helps constrain randomness and focus generation on the most likely outputs.                                |
| **Top P (Nucleus Sampling)** | Instead of temperature sampling, Top-P considers only tokens whose **cumulative probability ≤ top\_p**. For example, `top_p = 0.1` limits generation to the top 10% of probability mass.                                      |
| **Frequency Penalty**        | Reduces repetition of words or phrases. A higher value discourages the model from reusing tokens already present in the output, helping maintain variety and avoid redundancy.                                                |
| **Presence Penalty**         | Encourages the introduction of **new ideas or topics**. A higher value pushes the model to generate novel concepts instead of reiterating existing ones.                                                                      |
| **Stream**                   | Enables incremental, real-time output streaming — allowing responses to be processed and displayed as they are generated.                                                                                                     |
| **System Prompt**            | Provides a **high-level instruction or context** that guides the model’s tone, behavior, and responses throughout the interaction.                                                                                            |
