Skip to main content
Dedicated Endpoints provide a customizable environment for deploying AI models tailored to specific requirements.

Create your dedicated inference endpoint

Deploy a Dedicated inference model

Select a model from the list. Click on the button labeled “Launch Your Dedicated Endpoint”: Dedicated button

Review Configurations

Confirm your GPU type, deployment name, auto-scaling policy, and other system configurations: image.png Then click “Deploy”.

View Deployment Status

To view your deployment status click the “Deployment” tab on the top right.
  • Queued: The deployment task has been queued. Once all deployment tasks with higher priority are processed, the current task will be selected for deployment.
  • Deploying: System starts to look for hardware resources and deploy model endpoint.
  • Running: The deployment task has been completed. The endpoint is ready for production usage.
You will only be billed for the period of time in “Running” status. Dedicated View Deployment Pn

Invoke API Endpoint

Once deployment is in “Running” status, click the ”<>” symol to access endpoint URL: Dedicated Access Url Pn You can then use this URL to send API requests. An example is provided. Remember to replace “API_KEY” with your real API key. Dedicated Send Curl Api Request Pn
I