Skip to main content
Dedicated Endpoints provide a customizable environment for deploying AI models tailored to specific requirements.

Create Your Dedicated Inference Endpoint

Deploy a Dedicated Inference Model

Select a model from the list. Click the “Dedicated” button to start deployment: Dedicated button Alternatively, you can also click on the model card and then click the “Deploy” button on the top right: Dedicated deploy from model card

Review Configurations

Confirm your GPU type, deployment name, auto-scaling policy, and other system configurations: image.png Then click “Deploy”.

View Deployment Status

To view your deployment status click the “Deployment” tab on the top right.
StatusDescription
QueuedThe deployment task has been added to the queue. It will start once all higher-priority tasks have been processed.
DeployingThe system is allocating hardware resources and initializing the model endpoint.
RunningDeployment is complete, and the endpoint is active and ready for production use.
StoppedThe deployment has been manually stopped by the user. It can be restarted at any time.
ArchivedThe deployment has been terminated permanently. It cannot be restarted, but historical records are retained for reference.
You will only be billed for the period of time in “Running” status. Dedicated View Deployment Pn

Invoke API Endpoint

Once deployment is in “Running” status, click the ”<>” symol to access endpoint URL: Dedicated Access Url Pn You can then use this URL to send API requests. An example is provided. Remember to replace “API_KEY” with your real API key. Dedicated Send Curl Api Request Pn