Serverless Endpoints
Pre-configured, OpenAI-compatible APIs. No infrastructure to manage. Pay per token. Best for prototyping and variable workloads.
Dedicated Endpoints
Your own models on dedicated GPUs. Full control over hardware, scaling, and isolation. No rate limits. Best for steady or sensitive production traffic.
Serverless Endpoints
Instant access to popular models through OpenAI-compatible APIs.- Zero setup. Models are ready behind a single API key.
- Autoscaling. Capacity grows and shrinks with demand.
- Per-token billing. No idle compute charges.
Dedicated Endpoints
Provision your own endpoint on dedicated GPUs.- Bring your own model. Deploy fine-tuned or proprietary weights.
- Predictable performance. Reserved GPU resources, consistent latency.
- Isolated. Private network, separate from other tenants.
- No rate limits. Cap is set by the hardware you provision.
Inference in the console
A tour of the screens you’ll use inside the GMI Cloud Console.Dashboard
Landing screen for the Inference tab. Recent activity, usage trends, and shortcuts to your most-used resources.
Model Hub
Browse the full catalog of available models, filter by modality (text, image, video, audio), and open a model card for API examples and parameters.
Playground
Try any serverless model in the browser. Useful for prompt testing and parameter exploration before integrating via API.
My Models
Your uploaded or fine-tuned models. Manage versions and visibility from here.
Deployments
Manage Dedicated Endpoints: scale settings, model versions, and traffic routing.
Storage
Inference Storage holds inputs, outputs, and other artifacts referenced by your endpoints.
Workflows, Team Space, and generated media are part of GMI Studio. See Managing Workflows under the GMI Studio tab.