Dedicated Endpoint

Dedicated Endpoints provide a customizable environment for deploying AI models tailored to specific requirements.

Create Your Dedicated Inference Endpoint

Select a model from the list. Click the “Dedicated” button to start deployment:

Alternatively, you can also click on the model card and then click the “Deploy” button on the top right:

Confirm your GPU type, deployment name, auto-scaling policy, and other system configurations:

Then click “Deploy”.

To view your deployment status click the “Deployment” tab on the top right.

Status	Description
Queued	The deployment task has been added to the queue. It will start once all higher-priority tasks have been processed.
Deploying	The system is allocating hardware resources and initializing the model endpoint.
Running	Deployment is complete, and the endpoint is active and ready for production use.
Stopped	The deployment has been manually stopped by the user. It can be restarted at any time.
Archived	The deployment has been terminated permanently. It cannot be restarted, but historical records are retained for reference.

You will only be billed for the period of time in “Running” status.

Once deployment is in “Running” status, click the ”<>” symol to access endpoint URL:

You can then use this URL to send API requests. An example is provided. Remember to replace “API_KEY” with your real API key.