Skip to main content

Dedicated Endpoint

Dedicated Endpoints provide a customizable environment for deploying AI models tailored to specific requirements.

Create a dedicated inference endpoint once or on a schedule

Deploy a Dedicated inference model

Select a model from the list.

Click on the button labeled "Dedicated":

Dedicated button

Set compute resources

Select a GPU type, ram, and other system specifications:

Dedicated step 1

Set task details

Set a task name, file path, and other inference details:

Dedicated step 2 Task Name: Define task name.

File Path: Specify the script file name to be executed in the Docker image(without the file extension). Fox example, if the image includes a script named serve.py, enter serve here.

Deployment Name: Specify the deployment name that will be exposed to the Ray cluster by the script. For example, you can use app as the deployment name.

Type: One-off or Daily

One-off - The task runs once as scheduled time.

Daily - The task runs at the first scheduled time and can update replica numbers at subsequent daily scheduled times. This option is designed for recurring, predictable workloads where scaling needs follow a consistent daily pattern.

Timezone: Select the timezone for scheduling.

Time: Select the time for scheduling.

Replicas: Select the Min replicas and Max replicas for schedule.

Review configuration

Review your configurations to ensure they are correct. After confirmation, click the "Launch" button to launch task.

Dedicated step 3

Active task

In the task list, locate the task and then click button image-20250320145006225 to active the idle task.