Documentation Index
Fetch the complete documentation index at: https://docs.gmicloud.ai/llms.txt
Use this file to discover all available pages before exploring further.
Model ID
API Usage
The zai-org/GLM-4.5-Air-FP8 model can be accessed via the same REST API used by other GLM models.
It supports both general chat and function-calling workflows.
API Examples
Generate a Chat Completion
Use the chat/completions endpoint for conversational generation.
Shell
curl --request POST \
--url https://api.gmi-serving.com/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer *************' \
--data '{
"model": "zai-org/GLM-4.5-Air-FP8",
"messages": [
{"role": "system", "content": "You are a concise and efficient AI assistant."},
{"role": "user", "content": "Summarize the key benefits of FP8 quantization in AI models."}
],
"temperature": 0.6,
"max_tokens": 600
}'
Function Calling
curl --request POST \
--url https://api.gmi-serving.com/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer *************' \
--data '{
"model": "zai-org/GLM-4.5-Air-FP8",
"temperature": 0,
"max_tokens": 120,
"tools": [
{
"type": "function",
"function": {
"name": "get_weather_forecast",
"description": "Retrieve weather forecast information for a given city.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "Name of the target city, e.g., San Francisco."
}
},
"required": ["city"]
}
}
}
],
"messages": [
{
"role": "user",
"content": "What’s the current weather in San Francisco?"
}
]
}'
Python SDK Usage
import requests
import json
url = "https://api.gmi-serving.com/v1/chat/completions"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer *************"
}
payload = {
"model": "zai-org/GLM-4.5-Air-FP8",
"messages": [
{"role": "system", "content": "You are a concise and efficient AI assistant."},
{"role": "user", "content": "Summarize the key benefits of FP8 quantization in AI models."}
],
"temperature": 0.6,
"max_tokens": 600
}
response = requests.post(url, headers=headers, json=payload)
print(json.dumps(response.json(), indent=2))