Inference Engine Overview
The GMI Inference Engine provides a streamlined solution for deploying and managing machine learning models in production environments. It supports two types of endpoints: Serverless Endpoints and Dedicated Endpoints.
1. Serverless Endpoints
Serverless endpoints are pre-configured endpoints by GMI Cloud for you to get started quickly. Allow users to directly utilize AI models as OpenAI-compatible APIs without the need for extensive setup. This feature simplifies the integration process, offering the following benefits:
Out-of-the-box Functionality: Instantly access AI models that are pre-configured to work seamlessly with OpenAI standards.
Scalability: Automatically scale with your application’s needs, ensuring high availability and performance without manual intervention.
Cost-Efficiency: Pay only for the usage, eliminating the need to maintain infrastructure.
2. Dedicated Endpoints
Dedicated endpoints are customizable, user-provisioned resources designed for serving AI models with full control over infrastructure and configuration. This feature is ideal for users who need more control over their AI solutions. Key advantages include:
Full Customization: Deploy your own models and configure settings to meet specific application needs.
Enhanced Performance: Optimize resources for better performance tailored to your use case.
Isolation and Security: Benefit from a dedicated environment that isolates your workloads, enhancing security and compliance.