Create your dedicated inference endpoint
Deploy a Dedicated inference model
Select a model from the list. Click on the button labeled “Launch Your Dedicated Endpoint”:
Review Configurations
Confirm your GPU type, deployment name, auto-scaling policy, and other system configurations:
View Deployment Status
To view your deployment status click the “Deployment” tab on the top right.- Queued: The deployment task has been queued. Once all deployment tasks with higher priority are processed, the current task will be selected for deployment.
- Deploying: System starts to look for hardware resources and deploy model endpoint.
- Running: The deployment task has been completed. The endpoint is ready for production usage.

Invoke API Endpoint
Once deployment is in “Running” status, click the ”<>” symol to access endpoint URL:
