Skip to main content
The GMI Inference Engine provides a unified and efficient solution for deploying and managing machine learning models in production environments. It supports two primary types of endpoints — Serverless Endpoints and Dedicated Endpoints — to accommodate different scalability, performance, and customization needs.

1. Serverless Endpoints

Serverless Endpoints are fully managed, pre-configured endpoints provided by GMI Cloud, designed to help you start using AI models instantly. They enable users to access popular AI models through OpenAI-compatible APIs, without any infrastructure setup or management overhead. Key Benefits:
  • Out-of-the-Box Functionality Instantly access AI models that are pre-configured and fully compatible with OpenAI API standards.
  • Automatic Scalability Scale seamlessly with your application’s workload, ensuring high availability and low latency during traffic spikes.
  • Cost Efficiency Pay only for what you use — no need to maintain or provision dedicated compute resources.
Serverless Endpoints are ideal for fast prototyping, small-scale applications, or teams who want to focus on development instead of infrastructure management.

2. Dedicated Endpoints

Dedicated Endpoints are customizable, user-provisioned environments that offer full control over model deployment and resource configuration. These endpoints are designed for production-grade, enterprise-level workloads that demand maximum performance and flexibility. Key Advantages:
  • Full Customization Deploy your own fine-tuned or proprietary models, customize hardware configurations, and optimize parameters for your specific use case.
  • Enhanced Performance Allocate dedicated GPU resources to achieve consistent, predictable throughput and latency.
  • Isolation and Security Run in a private, isolated environment that ensures workload separation and enterprise-grade security compliance.
  • No Rate Limits Enjoy unrestricted throughput with no API rate limits — perfect for large-scale or continuous inference workloads.
  • Customizable Deployment Configure your deployment environment, model versions, and scaling policies to align with your organization’s infrastructure standards.
Dedicated Endpoints are ideal for organizations requiring high-performance, stable, and secure inference environments — such as enterprise production systems, custom AI applications, or large-scale research workloads.