Inference Engine Overview

The GMI Inference Engine provides a unified and efficient solution for deploying and managing machine learning models in production environments. It supports two primary types of endpoints — Serverless Endpoints and Dedicated Endpoints — to accommodate different scalability, performance, and customization needs.

1. Serverless Endpoints

Serverless Endpoints are fully managed, pre-configured endpoints provided by GMI Cloud, designed to help you start using AI models instantly. They enable users to access popular AI models through OpenAI-compatible APIs, without any infrastructure setup or management overhead. Key Benefits:

Out-of-the-Box Functionality Instantly access AI models that are pre-configured and fully compatible with OpenAI API standards.
Automatic Scalability Scale seamlessly with your application’s workload, ensuring high availability and low latency during traffic spikes.
Cost Efficiency Pay only for what you use — no need to maintain or provision dedicated compute resources.

Serverless Endpoints are ideal for fast prototyping, small-scale applications, or teams who want to focus on development instead of infrastructure management.

2. Dedicated Endpoints

Dedicated Endpoints are customizable, user-provisioned environments that offer full control over model deployment and resource configuration. These endpoints are designed for production-grade, enterprise-level workloads that demand maximum performance and flexibility. Key Advantages:

Full Customization Deploy your own fine-tuned or proprietary models, customize hardware configurations, and optimize parameters for your specific use case.
Enhanced Performance Allocate dedicated GPU resources to achieve consistent, predictable throughput and latency.
Isolation and Security Run in a private, isolated environment that ensures workload separation and enterprise-grade security compliance.
No Rate Limits Enjoy unrestricted throughput with no API rate limits — perfect for large-scale or continuous inference workloads.
Customizable Deployment Configure your deployment environment, model versions, and scaling policies to align with your organization’s infrastructure standards.

Dedicated Endpoints are ideal for organizations requiring high-performance, stable, and secure inference environments — such as enterprise production systems, custom AI applications, or large-scale research workloads.

Getting started

API Reference

Marketplace

Resources

Billing

Tutorials

Inference Engine Overview

1. Serverless Endpoints

2. Dedicated Endpoints

Getting started

API Reference

Marketplace

Resources

Billing

Tutorials

​1. Serverless Endpoints

​2. Dedicated Endpoints

1. Serverless Endpoints

2. Dedicated Endpoints