Dedicated Endpoints provide a customizable environment for deploying AI models tailored to specific requirements.Documentation Index
Fetch the complete documentation index at: https://docs.gmicloud.ai/llms.txt
Use this file to discover all available pages before exploring further.
Create Your Dedicated Inference Endpoint
Deploy a Dedicated Inference Model
Select a model from the list. Click the “Dedicated” button to start deployment:

Review Configurations
Confirm your GPU type, deployment name, auto-scaling policy, and other system configurations:
View Deployment Status
To view your deployment status click the “Deployment” tab on the top right.| Status | Description |
|---|---|
| Queued | The deployment task has been added to the queue. It will start once all higher-priority tasks have been processed. |
| Deploying | The system is allocating hardware resources and initializing the model endpoint. |
| Running | Deployment is complete, and the endpoint is active and ready for production use. |
| Stopped | The deployment has been manually stopped by the user. It can be restarted at any time. |
| Archived | The deployment has been terminated permanently. It cannot be restarted, but historical records are retained for reference. |

Invoke API Endpoint
Once deployment is in “Running” status, click the ”<>” symol to access endpoint URL:
