MiniMax M2.5 - GMI Cloud Documentation

Model ID

MiniMaxAI/MiniMax-M2.5

API Usage

MiniMax-M2.5 supports two API formats:

OpenAI-compatible: /v1/chat/completions endpoint
Anthropic-compatible: /v1/messages endpoint (with extended thinking support)

API Examples

OpenAI-Compatible Endpoint

Generate a model response using the chat completions endpoint.

Shell

curl --request POST \
  --url https://api.gmi-serving.com/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer *************' \
  --data '{
    "model": "MiniMaxAI/MiniMax-M2.5",
    "messages": [
      {"role": "system", "content": "You are a helpful AI assistant with strong reasoning and coding ability."},
      {"role": "user", "content": "Explain how the Mixture-of-Experts architecture improves inference efficiency in large language models."}
    ],
    "max_tokens": 1024
  }'

Python

import requests
import json

url = "https://api.gmi-serving.com/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}

payload = {
    "model": "MiniMaxAI/MiniMax-M2.5",
    "messages": [
        {"role": "system", "content": "You are a capable AI coding assistant"},
        {"role": "user", "content": "Refactor this multi-file Python module to make it async-ready"}
    ],
    "max_tokens": 1024
}

response = requests.post(url, headers=headers, json=payload)
print(json.dumps(response.json(), indent=2))

Anthropic-Compatible Endpoint (with Extended Thinking)

The /v1/messages endpoint provides access to the model’s reasoning process through the “thinking” content block. You can also use the Anthropic SDK directly by configuring the base URL.

Using Anthropic SDK (Recommended)

# Configure environment variables
export ANTHROPIC_BASE_URL=https://api.minimax.io/anthropic
export ANTHROPIC_API_KEY=${YOUR_API_KEY}

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="MiniMax-M2.5",
    max_tokens=1000,
    system="You are a helpful assistant.",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Hi, how are you?"
                }
            ]
        }
    ]
)

for block in message.content:
    if block.type == "thinking":
        print(f"Thinking:\n{block.thinking}\n")
    elif block.type == "text":
        print(f"Text:\n{block.text}\n")

Shell (Direct API)

curl --request POST \
  --url https://api.gmi-serving.com/v1/messages \
  -H 'Content-Type: application/json' \
  -H 'x-api-key: *************' \
  --data '{
    "model": "MiniMaxAI/MiniMax-M2.5",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Solve this step by step: What is 23 * 47?"}
    ]
  }'

Shell (with Streaming)

curl --request POST \
  --url https://api.gmi-serving.com/v1/messages \
  -H 'Content-Type: application/json' \
  -H 'x-api-key: *************' \
  --data '{
    "model": "MiniMaxAI/MiniMax-M2.5",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Tell me a three sentence bedtime story about a unicorn."}
    ],
    "stream": true
  }'

Python (with Streaming)

import anthropic

client = anthropic.Anthropic()

stream = client.messages.create(
    model="MiniMax-M2.5",
    max_tokens=1000,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": [{"type": "text", "text": "Hi, how are you?"}]}
    ],
    stream=True,
)

for chunk in stream:
    if chunk.type == "content_block_delta":
        if hasattr(chunk, "delta") and chunk.delta:
            if chunk.delta.type == "thinking_delta":
                print(chunk.delta.thinking, end="", flush=True)
            elif chunk.delta.type == "text_delta":
                print(chunk.delta.text, end="", flush=True)

MiniMax M2.1 MiniMax M2.7

​API Usage

​API Examples

​OpenAI-Compatible Endpoint

​Shell

​Python

​Anthropic-Compatible Endpoint (with Extended Thinking)

​Using Anthropic SDK (Recommended)

​Shell (Direct API)

​Shell (with Streaming)

​Python (with Streaming)