DeepSeek V4 Pro - GMI Cloud Documentation

Model ID

deepseek-ai/DeepSeek-V4-Pro

DeepSeek-V4 series incorporate several key upgrades in architecture and optimization: Hybrid Attention Architecture: We design a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to dramatically improve long-context efficiency. In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2. Manifold-Constrained Hyper-Connections (mHC): We incorporate mHC to strengthen conventional residual connections, enhancing stability of signal propagation across layers while preserving model expressivity. Muon Optimizer: We employ the Muon optimizer for faster convergence and greater training stability.

API Usage

You can interact with the DeepSeek-V4-Pro model through various programming languages and methods. Below are examples showing how to use the model’s API.

API Examples

Generate a model response using the chat endpoint of DeepSeek-V4-Pro.

Shell

curl --request POST \
  --url https://api.gmi-serving.com/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer *************' \
  --data '{
    "model": "deepseek-ai/DeepSeek-V4-Pro",
    "messages": [
      {"role": "system", "content": "You are a helpful AI assistant"},
      {"role": "user", "content": "List 3 countries and their capitals."}
    ],
    "temperature": 0,
    "max_tokens": 500
  }'

# example for function call
curl --request POST \
  --url https://api.gmi-serving.com/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer *************' \
  --data '{
    "temperature": 0,
    "max_tokens": 100,
    "model": "deepseek-ai/DeepSeek-V4-Pro",
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "query_weather",
                "description": "Get weather of an city, the user should supply a city first",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "city": {
                            "type": "string",
                            "description": "The city, e.g. Beijing"
                        }
                    },
                    "required": [
                        "city"
                    ]
                }
            }
        }
    ],
    "messages": [
        {
            "role": "user",
            "content": "Hows the weather like in Qingdao today"
        }
    ]
}'

Python

import requests
import json

url = "https://api.gmi-serving.com/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer *************"
}

payload = {
    "model": "deepseek-ai/DeepSeek-V4-Pro",
    "messages": [
        {"role": "system", "content": "You are a helpful AI assistant"},
        {"role": "user", "content": "List 3 countries and their capitals."}
    ],
    "temperature": 0,
    "max_tokens": 500
}

response = requests.post(url, headers=headers, json=payload)
print(json.dumps(response.json(), indent=2))

DeepSeek V4 Flash Gemini 3.1 Flash Lite Preview

​API Usage

​API Examples

​Shell

​Python

API Usage

API Examples

Shell

Python