Skip to main content

GMI Inference Engine API Reference

Introduction

This API reference describes the RESTful, streaming, and realtime APIs you can use to interact with the GMI Inference Engine. REST APIs are usable via HTTP in any environment that supports HTTP requests.

Authentication

The GMI API uses API keys for authentication. Create, manage, and learn more about API keys in your organization settings.

Important Security Notes:

  • API keys should be provided via HTTP Bearer authentication:
    Authorization: Bearer GMI_API_KEY
  • Never expose API keys in client-side code
  • Load keys from environment variables or key management services
  • For multi-organization access, specify headers:
    curl https://api.gmi-serving.com/v1/models \
    -H "Authorization: Bearer $GMI_API_KEY" \
    -H "X-Organization-ID: your_org_id"

List Models

GET https://api.gmi-serving.com/v1/models

Lists available models with basic information about each model, including capabilities, ownership, and permissions.

Example Request

curl https://api.gmi-serving.com/v1/models \
-H "Authorization: Bearer $GMI_API_KEY"

Response

{
"object": "list",
"data": [
{
"id": "",
"object": "deepseek-ai/DeepSeek-R1",
"created": 1687530000,
"owned_by": "public",
},
// ... other models ...
]
}

Response Parameters

ParameterTypeDescription
idstringModel identifier
objectstringAlways "model"
createdintegerUnix timestamp of model creation
owned_bystringOrganization that owns the model

Create Chat Completion

POST https://api.gmi-serving.com/v1/chat/completions

Creates a model response for the given chat conversation. Supports text, images, and audio modalities.

Authorization

Authorization: Bearer <token>

Request Body

{
"model": "deepseek-ai/DeepSeek-R1",
"messages": [
{"role": "user", "content": "Hello!"}
],
"max_tokens": 2000,
"temperature": 1
}

Parameters

ParameterTypeRequiredDefaultDescription
modelstringYes-Model identifier
messagesobject[]Yes-Conversation history
toolsobject[]No-Supported tools/functions
max_tokensintegerNo2000Max output tokens (1-128)
temperaturenumberNo10-2 sampling randomness
top_pnumberNo1Nucleus sampling (0-1)
top_kintegerNo-Top-k sampling (1-128)
ignore_eosbooleanNofalseContinue past EOS token
stopstring[]No-Up to 4 stop sequences
response_formatobjectNo-Force output format (e.g., JSON)
streambooleanNofalseStream partial progress
context_length_exceeded_behaviorstringNotruncate"truncate" or "error"

Response

{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "deepseek-ai/DeepSeek-R1",
"choices": [{
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
}
}],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}

Response Fields

FieldTypeDescription
idstringUnique response ID
objectstringAlways "chat.completion"
createdintegerUnix timestamp
modelstringModel used for generation
choicesobject[]Generated completions
usageobjectToken usage statistics

Important Notes

  1. Use response_format: {"type": "json_object"} for JSON mode
  2. Streaming responses include usage stats in final chunk
  3. Default context handling differs from Other provider (truncates instead of erroring)
  4. Multiple penalties interact - use carefully to avoid quality degradation

Key Notes

  1. Parameter support varies by model - check model documentation
  2. New projects should use Responses format for latest features
  3. Organization/project usage is tracked via headers
  4. Find organization/project IDs in settings pages