Qwen3 32B (FP8) - GMI Cloud Documentation

Model ID

Qwen/Qwen3-32B-FP8

API Usage

You can interact with the Qwen3-32B-FP8 model through various programming languages and methods. Below are examples showing how to use the model’s API.

API Examples

Generate a model response using the chat endpoint of Qwen3-32B-FP8.

Shell

curl --request POST \
  --url https://api.gmi-serving.com/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer *************' \
  --data '{
    "model": "Qwen/Qwen3-32B-FP8",
    "messages": [
      {"role": "system", "content": "You are a helpful AI assistant"},
      {"role": "user", "content": "List 3 countries and their capitals."}
    ],
    "temperature": 0,
    "max_tokens": 500
  }'

# example for function call 
curl --request POST \
  --url https://api.gmi-serving.com/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer *************' \
  --data '{
    "model": "Qwen/Qwen3-235B-A22B-FP8",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "What is the weather like in San Francisco?"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g., San Francisco, CA"
                        },
                        "unit": {
                            "type": "string",
                            "enum": [
                                "celsius",
                                "fahrenheit"
                            ],
                            "description": "The temperature unit to use"
                        }
                    },
                    "required": [
                        "location"
                    ]
                }
            }
        }
    ],
    "tool_choice": "auto"
}'

Python

import requests
import json

url = "https://api.gmi-serving.com/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer *************"
}

payload = {
    "model": "Qwen/Qwen3-32B-FP8",
    "messages": [
        {"role": "system", "content": "You are a helpful AI assistant"},
        {"role": "user", "content": "List 3 countries and their capitals."}
    ],
    "temperature": 0,
    "max_tokens": 500
}

response = requests.post(url, headers=headers, json=payload)
print(json.dumps(response.json(), indent=2))

Qwen3 235B A22B Thinking (2507, FP8)Qwen3.5 122B A10B

​API Usage

​API Examples

​Shell

​Python

API Usage

API Examples

Shell

Python