Skip to main content
GMI Inference runs production ML models behind two endpoint types: Serverless for instant access to managed models, and Dedicated for fully customized, isolated deployments.

Serverless Endpoints

Pre-configured, OpenAI-compatible APIs. No infrastructure to manage. Pay per token. Best for prototyping and variable workloads.

Dedicated Endpoints

Your own models on dedicated GPUs. Full control over hardware, scaling, and isolation. No rate limits. Best for steady or sensitive production traffic.

Serverless Endpoints

Instant access to popular models through OpenAI-compatible APIs.
  • Zero setup. Models are ready behind a single API key.
  • Autoscaling. Capacity grows and shrinks with demand.
  • Per-token billing. No idle compute charges.
Good for prototypes, small apps, and any workload where you’d rather not run infrastructure.

Dedicated Endpoints

Provision your own endpoint on dedicated GPUs.
  • Bring your own model. Deploy fine-tuned or proprietary weights.
  • Predictable performance. Reserved GPU resources, consistent latency.
  • Isolated. Private network, separate from other tenants.
  • No rate limits. Cap is set by the hardware you provision.
Good for enterprise production, latency-sensitive applications, or large continuous workloads.

Inference in the console

A tour of the screens you’ll use inside the GMI Cloud Console.

Dashboard

Landing screen for the Inference tab. Recent activity, usage trends, and shortcuts to your most-used resources.
Inference dashboard

Model Hub

Browse the full catalog of available models, filter by modality (text, image, video, audio), and open a model card for API examples and parameters.
Model Hub

Playground

Try any serverless model in the browser. Useful for prompt testing and parameter exploration before integrating via API.
Playground

My Models

Your uploaded or fine-tuned models. Manage versions and visibility from here.
My Models

Deployments

Manage Dedicated Endpoints: scale settings, model versions, and traffic routing.
Deployments

Storage

Inference Storage holds inputs, outputs, and other artifacts referenced by your endpoints.
Inference Storage
Workflows, Team Space, and generated media are part of GMI Studio. See Managing Workflows under the GMI Studio tab.