Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.gmicloud.ai/llms.txt

Use this file to discover all available pages before exploring further.

Model ID
zai-org/GLM-4.5-Air-FP8

API Usage

The zai-org/GLM-4.5-Air-FP8 model can be accessed via the same REST API used by other GLM models.
It supports both general chat and function-calling workflows.

API Examples

Generate a Chat Completion

Use the chat/completions endpoint for conversational generation.

Shell

curl --request POST \
  --url https://api.gmi-serving.com/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer *************' \
  --data '{
    "model": "zai-org/GLM-4.5-Air-FP8",
    "messages": [
      {"role": "system", "content": "You are a concise and efficient AI assistant."},
      {"role": "user", "content": "Summarize the key benefits of FP8 quantization in AI models."}
    ],
    "temperature": 0.6,
    "max_tokens": 600
  }'
Function Calling
curl --request POST \
  --url https://api.gmi-serving.com/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer *************' \
  --data '{
    "model": "zai-org/GLM-4.5-Air-FP8",
    "temperature": 0,
    "max_tokens": 120,
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather_forecast",
          "description": "Retrieve weather forecast information for a given city.",
          "parameters": {
            "type": "object",
            "properties": {
              "city": {
                "type": "string",
                "description": "Name of the target city, e.g., San Francisco."
              }
            },
            "required": ["city"]
          }
        }
      }
    ],
    "messages": [
      {
        "role": "user",
        "content": "What’s the current weather in San Francisco?"
      }
    ]
  }'

Python SDK Usage

import requests
import json

url = "https://api.gmi-serving.com/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer *************"
}

payload = {
    "model": "zai-org/GLM-4.5-Air-FP8",
    "messages": [
        {"role": "system", "content": "You are a concise and efficient AI assistant."},
        {"role": "user", "content": "Summarize the key benefits of FP8 quantization in AI models."}
    ],
    "temperature": 0.6,
    "max_tokens": 600
}

response = requests.post(url, headers=headers, json=payload)

print(json.dumps(response.json(), indent=2))