LLM Models - GMI Cloud

Large language models (LLMs) power chat, agents, code generation, retrieval, and vision-language workloads. Each page documents request formats, parameters, and examples for that model.

Technical topics

Chat & completions — Messages, system prompts, stop sequences, and sampling (temperature, top_p).
OpenAI-compatible APIs — Many models follow the same shapes as common chat completion endpoints.
Streaming — Token streaming for responsive UIs and lower perceived latency.
Reasoning & tools — Models may support extended thinking, tool calling, or structured outputs where noted on the model page.
Vision & OCR — Selected models accept images or documents for understanding or extraction.
Efficiency — Quantized (e.g. FP8) and MoE architectures trade size and cost against quality.

Model API & platform docs

For serving modes (serverless vs dedicated), billing, rate limits, task polling, and unified API patterns, see the API Reference section.

Full model list (88)

Model	Model ID	Organization
Anthropic Claude Haiku 4.5	anthropic/claude-haiku-4.5	anthropic
Anthropic Claude Opus 4.1	anthropic/claude-opus-4.1	anthropic
Anthropic Claude Opus 4.5	anthropic/claude-opus-4.5	anthropic
Anthropic Claude Opus 4.6	anthropic/claude-opus-4.6	anthropic
Anthropic Claude Opus 4.7	anthropic/claude-opus-4.7	anthropic
Anthropic Claude Sonnet 4	anthropic/claude-sonnet-4	anthropic
Anthropic Claude Sonnet 4.5	anthropic/claude-sonnet-4.5	anthropic
Anthropic Claude Sonnet 4.6	anthropic/claude-sonnet-4.6	anthropic
ByteDance Seed-2.0-Mini	bytedance/seed-2.0-mini	bytedance
CLIP-ViT-B-32-laion2B-s34B-b79K	laion/CLIP-ViT-B-32-laion2B-s34B-b79K	laion
DeepSeek Prover V2 671B	deepseek-ai/DeepSeek-Prover-V2-671B	deepseek-ai
DeepSeek V3.2	deepseek-ai/DeepSeek-V3.2	deepseek-ai
deepseek-ai/DeepSeek-V4-Flash	deepseek-ai/DeepSeek-V4-Flash	deepseek-ai
deepseek-ai/DeepSeek-V4-Pro	deepseek-ai/DeepSeek-V4-Pro	deepseek-ai
DeepSeek-R1-Distill-Llama-70B	deepseek-ai/DeepSeek-R1-Distill-Llama-70B	deepseek-ai
DeepSeek-R1-Distill-Qwen-1.5B	deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	deepseek-ai
DeepSeek-R1-Distill-Qwen-14B	deepseek-ai/DeepSeek-R1-Distill-Qwen-14B	deepseek-ai
DeepSeek-R1-Distill-Qwen-7B	deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	deepseek-ai
DeepSeek-V3-0324	deepseek-ai/DeepSeek-V3-0324	deepseek-ai
DeepSeek-V3.1	deepseek-ai/DeepSeek-V3.1	deepseek-ai
DeepSeek-V3.1-Terminus	deepseek-ai/DeepSeek-V3.1-Terminus	deepseek-ai
DeepSeek-V3.2	deepseek-ai/DeepSeek-R1-0528	deepseek-ai
DeepSeek-V3.2	zai-org/GLM-4.7-FP8	zai-org
DeepSeek-V3.2-Exp	deepseek-ai/DeepSeek-V3.2-Exp	deepseek-ai
DeepSeek-V3.2-Speciale	deepseek-ai/DeepSeek-V3.2-Speciale	deepseek-ai
GLM-4.5-Air-FP8	zai-org/GLM-4.5-Air-FP8	zai-org
GLM-4.5-FP8	zai-org/GLM-4.5-FP8	zai-org
GLM-4.6	zai-org/GLM-4.6	zai-org
GLM-4.7-Flash	zai-org/GLM-4.7-Flash	zai-org
GLM-5	zai-org/GLM-5-FP8	zai-org
GLM-5.1	zai-org/GLM-5.1-FP8	zai-org
Google Gemini 3 Flash Preview	google/gemini-3-flash-preview	google
Google Gemini 3.1 Flash-Lite Preview	google/gemini-3.1-flash-lite-preview	google
Google Gemini 3.1 Pro Preview	google/gemini-3.1-pro-preview	google
Google Gemma 4 26B A4B	google/gemma-4-26b-a4b-it	google
Google Gemma 4 31B	google/gemma-4-31b-it	google
HunyuanOCR	tencent/HunyuanOCR	tencent
KAT-Coder-Pro V2	kwaipilot/kat-coder-pro-v2	kwaipilot
Kimi-K2.6	moonshotai/Kimi-K2.6	moonshotai
Llama-3.1-8B-Instruct	meta-llama/Llama-3.1-8B-Instruct	meta-llama
Llama-3.3-70B-Instruct	meta-llama/Llama-3.3-70B-Instruct	meta-llama
Llama-4-Maverick-17B-128E-Instruct	meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8	meta-llama
Llama-4-Scout-17B-16E-Instruct	meta-llama/Llama-4-Scout-17B-16E-Instruct	meta-llama
MiniMax-M2	MiniMaxAI/MiniMax-M2	MiniMaxAI
MiniMax-M2.1	MiniMaxAI/MiniMax-M2.1	MiniMaxAI
MiniMax-M2.5	MiniMaxAI/MiniMax-M2.5	MiniMaxAI
MiniMax-M2.7	MiniMaxAI/MiniMax-M2.7	MiniMaxAI
Moonshotai Kimi K2 Instruct 0905	moonshotai/Kimi-K2-Instruct-0905	moonshotai
Moonshotai Kimi K2 Instruct 0905	moonshotai/Kimi-K2-Thinking	moonshotai
Moonshotai Kimi-K2.5	moonshotai/Kimi-K2.5	moonshotai
Nemotron 3 Nano Omni	nvidia/NVIDIA-Nemotron-3-Nano-Omni	nvidia
olmOCR-2-7B-1025-FP8	allenai/olmOCR-2-7B-1025-FP8	allenai
OpenAI GPT OSS 120B	openai/gpt-oss-120b	openai
OpenAI GPT OSS 20B	openai/gpt-oss-20b	openai
OpenAI GPT-4o	openai/gpt-4o	openai
OpenAI GPT-4o-mini	openai/gpt-4o-mini	openai
OpenAI GPT-5	openai/gpt-5	openai
OpenAI GPT-5.1	openai/gpt-5.1	openai
OpenAI GPT-5.1-Chat	openai/gpt-5.1-chat	openai
OpenAI GPT-5.2	openai/gpt-5.2	openai
OpenAI GPT-5.2-Chat	openai/gpt-5.2-chat	openai
OpenAI GPT-5.2-codex	openai/gpt-5.2-codex	openai
OpenAI GPT-5.3-codex	openai/gpt-5.3-codex	openai
OpenAI GPT-5.4	openai/gpt-5.4	openai
OpenAI GPT-5.4-mini	openai/gpt-5.4-mini	openai
OpenAI GPT-5.4-nano	openai/gpt-5.4-nano	openai
OpenAI GPT-5.4-pro	openai/gpt-5.4-pro	openai
OpenAI gpt-5.5	openai/gpt-5.5	openai
Qwen3 Next 80B A3B Instruct	Qwen/Qwen3-Next-80B-A3B-Instruct	Qwen
Qwen3 Next 80B A3B Thinking	Qwen/Qwen3-Next-80B-A3B-Thinking	Qwen
Qwen3-235B-A22B-FP8	Qwen/Qwen3-235B-A22B-FP8	Qwen
Qwen3-235B-A22B-Instruct-2507-FP8	Qwen/Qwen3-235B-A22B-Instruct-2507-FP8	Qwen
Qwen3-235B-A22B-Thinking-2507-FP8	Qwen/Qwen3-235B-A22B-Thinking-2507-FP8	Qwen
Qwen3-32B-FP8	Qwen/Qwen3-32B-FP8	Qwen
Qwen3-Coder-30B-A3B-Instruct-FP8	Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8	Qwen
Qwen3-Coder-480B-A35B-Instruct-FP8	Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8	Qwen
Qwen3-VL-235B-A22B-Instruct-FP8	Qwen/Qwen3-VL-235B-A22B-Instruct-FP8	Qwen
Qwen3.5 122B A10B	Qwen/Qwen3.5-122B-A10B	Qwen
Qwen3.5 27B	Qwen/Qwen3.5-27B	Qwen
Qwen3.5 35B A3B	Qwen/Qwen3.5-35B-A3B	Qwen
Qwen3.5 397B A17B	Qwen/Qwen3.5-397B-A17B	Qwen
Qwen3.6 Max Preview	Qwen/Qwen3.6-Max-Preview	Qwen
Qwen3.6 Plus	Qwen/Qwen3.6-Plus-2026-04-02	Qwen
Qwen3.6 Plus	Qwen/Qwen3.6-Plus	Qwen
QwQ-32B	Qwen/Qwen3-30B-A3B	Qwen
Wan2.2-I2V-A14B	Wan-AI/Wan2.2-I2V-A14B	Wan-AI
Xiaomi MiMo-V2.5	XiaomiMiMo/MiMo-V2.5	XiaomiMiMo
Xiaomi MiMo-V2.5-Pro	XiaomiMiMo/MiMo-V2.5-Pro	XiaomiMiMo

Model Library

Documentation Index

​Technical topics

​Model API & platform docs

​Full model list (88)

Technical topics

Model API & platform docs

Full model list (88)