Documentation Index
Fetch the complete documentation index at: https://docs.gmicloud.ai/llms.txt
Use this file to discover all available pages before exploring further.
Model ID
deepseek-ai/DeepSeek-V4-Pro
DeepSeek-V4 series incorporate several key upgrades in architecture and optimization:
Hybrid Attention Architecture: We design a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to dramatically improve long-context efficiency. In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2.
Manifold-Constrained Hyper-Connections (mHC): We incorporate mHC to strengthen conventional residual connections, enhancing stability of signal propagation across layers while preserving model expressivity.
Muon Optimizer: We employ the Muon optimizer for faster convergence and greater training stability.
API Usage
You can interact with the DeepSeek-V4-Pro model through various programming languages and methods. Below are examples showing how to use the model’s API.
API Examples
Generate a model response using the chat endpoint of DeepSeek-V4-Pro.
Shell
curl --request POST \
--url https://api.gmi-serving.com/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer *************' \
--data '{
"model": "deepseek-ai/DeepSeek-V4-Pro",
"messages": [
{"role": "system", "content": "You are a helpful AI assistant"},
{"role": "user", "content": "List 3 countries and their capitals."}
],
"temperature": 0,
"max_tokens": 500
}'
# example for function call
curl --request POST \
--url https://api.gmi-serving.com/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer *************' \
--data '{
"temperature": 0,
"max_tokens": 100,
"model": "deepseek-ai/DeepSeek-V4-Pro",
"tools": [
{
"type": "function",
"function": {
"name": "query_weather",
"description": "Get weather of an city, the user should supply a city first",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city, e.g. Beijing"
}
},
"required": [
"city"
]
}
}
}
],
"messages": [
{
"role": "user",
"content": "Hows the weather like in Qingdao today"
}
]
}'
Python
import requests
import json
url = "https://api.gmi-serving.com/v1/chat/completions"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer *************"
}
payload = {
"model": "deepseek-ai/DeepSeek-V4-Pro",
"messages": [
{"role": "system", "content": "You are a helpful AI assistant"},
{"role": "user", "content": "List 3 countries and their capitals."}
],
"temperature": 0,
"max_tokens": 500
}
response = requests.post(url, headers=headers, json=payload)
print(json.dumps(response.json(), indent=2))