> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gmicloud.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Inference Engine Overview

The **GMI Inference Engine** provides a unified and efficient solution for deploying and managing machine learning models in production environments.
It supports two primary types of endpoints — **Serverless Endpoints** and **Dedicated Endpoints** — to accommodate different scalability, performance, and customization needs.

***

### **1. Serverless Endpoints**

**Serverless Endpoints** are fully managed, pre-configured endpoints provided by GMI Cloud, designed to help you start using AI models instantly. They enable users to access popular AI models through **OpenAI-compatible APIs**, without any infrastructure setup or management overhead.

**Key Benefits:**

* **Out-of-the-Box Functionality**
  Instantly access AI models that are pre-configured and fully compatible with OpenAI API standards.

* **Automatic Scalability**
  Scale seamlessly with your application’s workload, ensuring high availability and low latency during traffic spikes.

* **Cost Efficiency**
  Pay only for what you use — no need to maintain or provision dedicated compute resources.

Serverless Endpoints are ideal for fast prototyping, small-scale applications, or teams who want to focus on development instead of infrastructure management.

***

### **2. Dedicated Endpoints**

**Dedicated Endpoints** are customizable, user-provisioned environments that offer full control over model deployment and resource configuration. These endpoints are designed for production-grade, enterprise-level workloads that demand maximum performance and flexibility.

**Key Advantages:**

* **Full Customization**
  Deploy your own fine-tuned or proprietary models, customize hardware configurations, and optimize parameters for your specific use case.

* **Enhanced Performance**
  Allocate dedicated GPU resources to achieve consistent, predictable throughput and latency.

* **Isolation and Security**
  Run in a private, isolated environment that ensures workload separation and enterprise-grade security compliance.

* **No Rate Limits**
  Enjoy unrestricted throughput with no API rate limits — perfect for large-scale or continuous inference workloads.

* **Customizable Deployment**
  Configure your deployment environment, model versions, and scaling policies to align with your organization’s infrastructure standards.

Dedicated Endpoints are ideal for organizations requiring high-performance, stable, and secure inference environments — such as enterprise production systems, custom AI applications, or large-scale research workloads.
