> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gmicloud.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Rate limits

> How GMI Cloud rate-limits inference API requests and how to handle 429 responses.

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

To maintain system stability and equitable access, our API enforces rate limiting, controlling how frequently an organization can send requests within a certain time window.

API rate limits are defined in two ways:

* **TPM (Tokens per Minute)** for LLM models
* **RPH (Requests per Hour)** for video models

These limits are enforced at the organization level.

## Usage Tiers and Auto Upgrades

Rate limits vary by usage tier, with each tier offering different quotas for each model. By default, organizations are assigned to **Tier 1**.

As you buy credit from our platform, we automatically upgrade you to the next usage tier, using the following tier system. For example, after purchasing a \$50 credit balance, you will be upgraded to Tier 2 within 24 hours.

Please note that voucher redemptions do not count towards purchase.

| Tier Name | Total Purchase Amount | Time After  |
| --------- | --------------------- | ----------- |
| Tier 1    | \$0                   | Immediately |
| Tier 2    | \$50                  | 24 hours    |
| Tier 3    | \$500                 | 24 hours    |
| Tier 4    | \$1000                | 24 hours    |
| Tier 5    | \$5000                | 24 hours    |

If somehow you wish to request for a manual tier upgrade, please contact [support@gmicloud.ai](mailto:support@gmicloud.ai).

## Rate Limit Table

| Model Name | Tier 1 TPM | Tier 2 TPM | Tier 3 TPM | Tier 4 TPM  | Tier 5 TPM  |
| ---------- | ---------- | ---------- | ---------- | ----------- | ----------- |
| All Models | 1,000,000  | 3,000,000  | 50,000,000 | 100,000,000 | 300,000,000 |
