Serverless Endpoint

We offer a range of serverless endpoints for popular open-source models.

Access a Serverless inference model

Select a model from the list. Click on the model card:

Model Details

To access the serverless connection details for your model, click on “Descriptions”:

Playground

To access the Playground for your model, click on “Playground”:

Option	Description
Temperature	Controls how much randomness you want in the generated text. A higher temperature produces more creative results, while a temperature of 0 yields deterministic, repeatable outputs — useful for testing and debugging.
Max Tokens	Defines the maximum number of tokens the model can generate (default: 4096). If the combined token count (prompt + output) exceeds the model’s context limit, the API automatically reduces the output to fit.
Top K	A sampling method that filters to the k most probable tokens, redistributing probability mass among them. This helps constrain randomness and focus generation on the most likely outputs.
Top P (Nucleus Sampling)	Instead of temperature sampling, Top-P considers only tokens whose cumulative probability ≤ top_p. For example, `top_p = 0.1` limits generation to the top 10% of probability mass.
Frequency Penalty	Reduces repetition of words or phrases. A higher value discourages the model from reusing tokens already present in the output, helping maintain variety and avoid redundancy.
Presence Penalty	Encourages the introduction of new ideas or topics. A higher value pushes the model to generate novel concepts instead of reiterating existing ones.
Stream	Enables incremental, real-time output streaming — allowing responses to be processed and displayed as they are generated.
System Prompt	Provides a high-level instruction or context that guides the model’s tone, behavior, and responses throughout the interaction.

Getting started

API Reference

Marketplace

Resources

Billing

Tutorials

Serverless Endpoint

Access a Serverless inference model

Model Details

Playground

Getting started

API Reference

Marketplace

Resources

Billing

Tutorials

​Access a Serverless inference model

​Model Details

​Playground

Access a Serverless inference model

Model Details

Playground