Documentation Index
Fetch the complete documentation index at: https://docs.gmicloud.ai/llms.txt
Use this file to discover all available pages before exploring further.
Model ID
Gemini Batch Inference API Usage Guide
Overview
Gemini Batch Inference allows you to process large volumes of requests asynchronously at approximately 50% lower cost than online inference. Ideal for batch processing tasks like document analysis, image labeling, or bulk content generation.
Authentication
All API requests require authentication using an API key. Include your API key in the Authorization header:
Authorization: Bearer YOUR_API_KEY
Submit Batch Job
Base URL
https://console.gmicloud.ai
Endpoint
POST /api/v1/ie/requestqueue/apikey/requests
curl -X POST "https://console.gmicloud.ai/api/v1/ie/requestqueue/apikey/requests" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-batch",
"payload": {
"model": "gemini-3-flash-preview",
"input_data": "{\"request\":{\"contents\":[{\"role\":\"user\",\"parts\":[{\"text\":\"What is 2+2?\"}]}]}}\n{\"request\":{\"contents\":[{\"role\":\"user\",\"parts\":[{\"text\":\"What is the capital of France?\"}]}]}}"
}
}'
Request Parameters
| Parameter | Type | Required | Description | Default | Constraints |
|---|
model | enum | Yes | Target Gemini model for batch prediction. | gemini-3-flash-preview | See supported models below |
input_data | string | Yes | JSONL content where each line is a request object. Can be raw JSONL string or base64 encoded. | - | Max 1GB file size, max 200K requests |
Supported Models
| Model | Description | Best For |
|---|
gemini-3-flash-preview | Fast, efficient Gemini 3 Flash | High-volume, simple tasks |
gemini-3-pro-preview | Most capable Gemini 3 Pro | Complex reasoning tasks |
gemini-3-pro-image-preview | Gemini 3 Pro with image generation | Batch image generation |
Each line must be a valid JSON object with a request field containing contents:
Text-only requests:
{"request":{"contents":[{"role":"user","parts":[{"text":"What is the capital of France?"}]}]}}
{"request":{"contents":[{"role":"user","parts":[{"text":"Summarize quantum computing in 3 sentences."}]}]}}
{"request":{"contents":[{"role":"user","parts":[{"text":"Write a haiku about mountains."}]}]}}
Multimodal requests (with images):
{"request":{"contents":[{"role":"user","parts":[{"text":"Describe this image"},{"file_data":{"file_uri":"gs://bucket/image.jpg","mime_type":"image/jpeg"}}]}]}}
After submitting, you receive a request_id to track the job:
{
"request_id": "7eaa77fc-bc67-4021-9f1b-96b3fd832314",
"model": "gemini-batch",
"status": "processing",
"outcome": {
"batch_job_state": "JOB_STATE_QUEUED"
},
"created_at": 1761763441,
"updated_at": 1761763441
}
Check Job Status
Endpoint
GET /api/v1/ie/requestqueue/apikey/requests/{request_id}
Example
curl -X GET "https://console.gmicloud.ai/api/v1/ie/requestqueue/apikey/requests/7eaa77fc-bc67-4021-9f1b-96b3fd832314" \
-H "Authorization: Bearer YOUR_API_KEY"
Response (In Progress)
{
"request_id": "7eaa77fc-bc67-4021-9f1b-96b3fd832314",
"model": "gemini-batch",
"status": "processing",
"outcome": {
"batch_job_state": "JOB_STATE_RUNNING"
},
"created_at": 1761763441,
"updated_at": 1761764441
}
Response (Completed)
{
"request_id": "7eaa77fc-bc67-4021-9f1b-96b3fd832314",
"model": "gemini-batch",
"status": "success",
"outcome": {
"batch_job_state": "JOB_STATE_SUCCEEDED",
"output_url": "https://storage.googleapis.com/.../predictions.jsonl",
"output_download_urls": [
"https://storage.googleapis.com/.../predictions-00000-of-00001.jsonl"
],
"batch_job_completion_stats": {
"successful_count": "100",
"failed_count": "2"
},
"token_usage": {
"total_prompt_tokens": 5000,
"total_candidates_tokens": 8000,
"successful_requests": 100,
"failed_requests": 2
},
"actual_cost_usd": "$0.001234"
},
"created_at": 1761763441,
"updated_at": 1761765441
}
Download and Parse Output
The output is a JSONL file where each line corresponds to one input request:
{
"status": "",
"processed_time": "2024-01-15T10:30:00.000+00:00",
"request": {"contents": [{"role": "user", "parts": [{"text": "What is 2+2?"}]}]},
"response": {
"candidates": [{
"content": {"parts": [{"text": "4"}], "role": "model"},
"finishReason": "STOP"
}],
"usageMetadata": {
"promptTokenCount": 5,
"candidatesTokenCount": 1,
"totalTokenCount": 6
}
}
}
Note: An empty status field indicates success. Failed requests will have an error message in status.
List Your Batch Jobs
Endpoint
GET /api/v1/ie/requestqueue/apikey/requests?model_id=gemini-batch
Example
curl -X GET "https://console.gmicloud.ai/api/v1/ie/requestqueue/apikey/requests?model_id=gemini-batch" \
-H "Authorization: Bearer YOUR_API_KEY"
Request Status Values
| Status | Description |
|---|
queued | Job is waiting to be submitted to Vertex AI |
processing | Batch job is running (may take minutes to hours) |
success | Job completed (check batch_job_completion_stats for details) |
failed | Job failed (check error message in outcome) |
Batch Job States (Vertex AI)
| State | Description |
|---|
JOB_STATE_QUEUED | Waiting for resources |
JOB_STATE_PENDING | Job is being prepared |
JOB_STATE_RUNNING | Processing requests |
JOB_STATE_SUCCEEDED | All requests completed |
JOB_STATE_PARTIALLY_SUCCEEDED | Some requests failed |
JOB_STATE_FAILED | Job failed |
JOB_STATE_CANCELLED | Job was cancelled (e.g., 24h timeout) |
Limits
| Limit | Value |
|---|
| Max input file size | 1 GB |
| Max requests per job | 200,000 |
| Max processing time | 24 hours (after job starts running) |
| Max queue time | 72 hours |
Tips for Best Results
- Batch Size: Ideal for 100+ requests. For fewer requests, consider online inference.
- File References: Use
gs:// URIs for images/documents stored in Google Cloud Storage.
- Processing Time: Jobs typically complete within minutes to hours depending on volume.
- Cost Optimization: Use batch for non-time-sensitive workloads to save ~50%.
- Error Handling: Check
batch_job_completion_stats for failed request counts.