Gemini Batch Inference API Usage Guide
Overview
Gemini Batch Inference allows you to process large volumes of requests asynchronously at approximately 50% lower cost than online inference. Ideal for batch processing tasks like document analysis, image labeling, or bulk content generation.Authentication
All API requests require authentication using an API key. Include your API key in the Authorization header:Submit Batch Job
Base URL
Endpoint
Request Format
Request Parameters
| Parameter | Type | Required | Description | Default | Constraints |
|---|---|---|---|---|---|
model | enum | Yes | Target Gemini model for batch prediction. | gemini-3-flash-preview | See supported models below |
input_data | string | Yes | JSONL content where each line is a request object. Can be raw JSONL string or base64 encoded. | - | Max 1GB file size, max 200K requests |
Supported Models
| Model | Description | Best For |
|---|---|---|
gemini-3-flash-preview | Fast, efficient Gemini 3 Flash | High-volume, simple tasks |
gemini-3-pro-preview | Most capable Gemini 3 Pro | Complex reasoning tasks |
gemini-3-pro-image-preview | Gemini 3 Pro with image generation | Batch image generation |
JSONL Input Format
Each line must be a valid JSON object with arequest field containing contents:
Text-only requests:
Response (Immediate)
After submitting, you receive arequest_id to track the job:
Check Job Status
Endpoint
Example
Response (In Progress)
Response (Completed)
Download and Parse Output
The output is a JSONL file where each line corresponds to one input request:status field indicates success. Failed requests will have an error message in status.
List Your Batch Jobs
Endpoint
Example
Request Status Values
| Status | Description |
|---|---|
queued | Job is waiting to be submitted to Vertex AI |
processing | Batch job is running (may take minutes to hours) |
success | Job completed (check batch_job_completion_stats for details) |
failed | Job failed (check error message in outcome) |
Batch Job States (Vertex AI)
| State | Description |
|---|---|
JOB_STATE_QUEUED | Waiting for resources |
JOB_STATE_PENDING | Job is being prepared |
JOB_STATE_RUNNING | Processing requests |
JOB_STATE_SUCCEEDED | All requests completed |
JOB_STATE_PARTIALLY_SUCCEEDED | Some requests failed |
JOB_STATE_FAILED | Job failed |
JOB_STATE_CANCELLED | Job was cancelled (e.g., 24h timeout) |
Limits
| Limit | Value |
|---|---|
| Max input file size | 1 GB |
| Max requests per job | 200,000 |
| Max processing time | 24 hours (after job starts running) |
| Max queue time | 72 hours |
Tips for Best Results
- Batch Size: Ideal for 100+ requests. For fewer requests, consider online inference.
- File References: Use
gs://URIs for images/documents stored in Google Cloud Storage. - Processing Time: Jobs typically complete within minutes to hours depending on volume.
- Cost Optimization: Use batch for non-time-sensitive workloads to save ~50%.
- Error Handling: Check
batch_job_completion_statsfor failed request counts.