Llama 3.3 70B Instruct - GMI Cloud Documentation

Model ID

meta-llama/Llama-3.3-70B-Instruct

API Usage

You can interact with the Llama-3.3-70B-Instruct model through various programming languages and methods. Below are examples showing how to use the model’s API.

API Examples

Generate a model response using the chat endpoint of Llama-3.3-70B-Instruct.

Shell

curl --request POST \
  --url https://api.gmi-serving.com/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer *************' \
  --data '{
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "messages": [
      {"role": "system", "content": "You are a helpful AI assistant"},
      {"role": "user", "content": "List 3 countries and their capitals."}
    ],
    "temperature": 0,
    "max_tokens": 500
  }'

Python

import requests
import json

url = "https://api.gmi-serving.com/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer *************"
}

payload = {
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "messages": [
        {"role": "system", "content": "You are a helpful AI assistant"},
        {"role": "user", "content": "List 3 countries and their capitals."}
    ],
    "temperature": 0,
    "max_tokens": 500
}

response = requests.post(url, headers=headers, json=payload)
print(json.dumps(response.json(), indent=2))

Llama 3.1 8B Instruct Llama 4 Maverick 17B 128E Instruct (FP8)

Documentation Index

​API Usage

​API Examples

​Shell

​Python

API Usage

API Examples

Shell

Python