Nvidia Nemotron 3 Nano Omni - GMI Cloud Documentation

Model ID

nvidia/nemotron-3-nano-omni

Nvidia Nemotron 3 Nano Omni is served through GMI Cloud’s OpenAI-compatible Chat Completions API at https://api.gmi-serving.com.

API Usage

You can interact with Nvidia Nemotron 3 Nano Omni through the chat completions endpoint. Examples below.

Create chat completion

Default

curl https://api.gmi-serving.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $GMI_API_KEY" \
  -d '{
    "model": "nvidia/nemotron-3-nano-omni",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

Streaming

curl https://api.gmi-serving.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $GMI_API_KEY" \
  -d '{
    "model": "nvidia/nemotron-3-nano-omni",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "stream": true
  }'

Python

import requests, json

url = "https://api.gmi-serving.com/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer $GMI_API_KEY"
}
payload = {
    "model": "nvidia/nemotron-3-nano-omni",
    "messages": [
        {"role": "system", "content": "You are a helpful AI assistant"},
        {"role": "user", "content": "List 3 countries and their capitals."}
    ],
    "temperature": 0,
    "max_completion_tokens": 500
}
response = requests.post(url, headers=headers, json=payload)
print(json.dumps(response.json(), indent=2))

Kimi K2 Thinking NVIDIA Nemotron 3 Nano Omni

​API Usage

​Create chat completion

​Default

​Streaming

​Python

API Usage

Create chat completion

Default

Streaming

Python