GPU Compute - GMI Cloud Documentation

Run GPU workloads on GMI Cloud. Pick the format that fits your job: a managed Kubernetes cluster for training, a container instance for short jobs and notebooks, or a dedicated bare-metal server for full hardware control.

What you can do here

Launch a managed GPU cluster

Production-ready Kubernetes clusters with H200 or B200 nodes, provisioned and operated by GMI.

Run a container workload

Spin up single containers from a template (vLLM, SGLang, JupyterLab, custom images) on demand.

Request a bare-metal server

Dedicated hosts when you need full OS access, custom drivers, or persistent local NVMe.

Attach networking

Configure firewalls and Elastic IPs for any compute resource you provision.

How requests work

Browse the cluster catalog in the console. Each card lists the SKU, region, and full hardware spec.
Click Request Cluster on the card you want. The form is pre-filled with that SKU and region.
GMI support reviews the request and provisions the resources.
Once ready, the cluster appears under Managed GPU Clusters and you can start using it.

Track the status of in-flight requests on the Cluster Requests page.

Pricing

Prices vary by SKU, GPU type, and region. The catalog cards in the console always show the current rate for each option, and the full live pricing list is on the Pricing page.

Product entitlements

Bare Metal and Container access is gated per organization. If those sections show a “Not yet available” banner, click Contact Support from the console to request access.

Browser Requirements

​What you can do here