LLM API Reference

Introduction

This API reference describes the RESTful, streaming, and realtime APIs you can use to interact with the GMI Inference Engine. REST APIs are usable via HTTP in any environment that supports HTTP requests.

Authentication

The GMI API uses API keys for authentication. Create, manage, and learn more about API keys in your organization settings.

Important Security Notes:

API keys should be provided via HTTP Bearer authentication:
```
Authorization: Bearer GMI_API_KEY
```
Never expose API keys in client-side code
Load keys from environment variables or key management services

For multi-organization access, specify headers:

curl https://api.gmi-serving.com/v1/models \
  -H "Authorization: Bearer $GMI_API_KEY" \
  -H "X-Organization-ID: your_org_id"

List Models

GET https://api.gmi-serving.com/v1/models

Lists available models with basic information about each model, including capabilities, ownership, and permissions.

Example Request

curl https://api.gmi-serving.com/v1/models \
  -H "Authorization: Bearer $GMI_API_KEY"

Response

{
  "object": "list",
  "data": [
    {
      "id": "",
      "object": "deepseek-ai/DeepSeek-R1",
      "created": 1687530000,
      "owned_by": "public",
    },
    // ... other models ...
  ]
}

Response Parameters

Parameter	Type	Description
`id`	string	Model identifier
`object`	string	Always "model"
`created`	integer	Unix timestamp of model creation
`owned_by`	string	Organization that owns the model

Create Chat Completion

POST https://api.gmi-serving.com/v1/chat/completions

Creates a model response for the given chat conversation. Supports text, images, and audio modalities.

Authorization

Authorization: Bearer <token>

Request Body

{
"model": "deepseek-ai/DeepSeek-R1",
"messages": [
{"role": "user", "content": "Hello!"}
],
"max_tokens": 2000,
"temperature": 1
}

Parameters

Parameter	Type	Required	Default	Description
`model`	string	Yes	-	Model identifier
`messages`	object[]	Yes	-	Conversation history
`tools`	object[]	No	-	Supported tools/functions
`max_tokens`	integer	No	2000	Max output tokens (1-128)
`temperature`	number	No	1	0-2 sampling randomness
`top_p`	number	No	1	Nucleus sampling (0-1)
`top_k`	integer	No	-	Top-k sampling (1-128)
`ignore_eos`	boolean	No	false	Continue past EOS token
`stop`	string[]	No	-	Up to 4 stop sequences
`response_format`	object	No	-	Force output format (e.g., JSON)
`stream`	boolean	No	false	Stream partial progress
`context_length_exceeded_behavior`	string	No	truncate	"truncate" or "error"

Response

{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "deepseek-ai/DeepSeek-R1",
"choices": [{
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
}
}],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}

Response Fields

Field	Type	Description
`id`	string	Unique response ID
`object`	string	Always "chat.completion"
`created`	integer	Unix timestamp
`model`	string	Model used for generation
`choices`	object[]	Generated completions
`usage`	object	Token usage statistics

Important Notes

Use response_format: {"type": "json_object"} for JSON mode
Streaming responses include usage stats in final chunk
Default context handling differs from Other provider (truncates instead of erroring)
Multiple penalties interact - use carefully to avoid quality degradation

Key Notes

Parameter support varies by model - check model documentation
New projects should use Responses format for latest features
Organization/project usage is tracked via headers
Find organization/project IDs in settings pages

Introduction​

Authentication​

List Models​

Example Request​

Response​

Response Parameters​

Create Chat Completion​

Authorization​

Request Body​

Parameters​

Response​

Response Fields​

Important Notes​

Key Notes​

Introduction

Authentication

List Models

Example Request

Response

Response Parameters

Create Chat Completion

Authorization

Request Body

Parameters

Response

Response Fields

Important Notes

Key Notes