Model ID

Gemini-batch-inference

Calling method: sync

Gemini Batch Inference API Usage Guide

Overview

Gemini Batch Inference allows you to process large volumes of requests asynchronously at approximately 50% lower cost than online inference. Ideal for batch processing tasks like document analysis, image labeling, or bulk content generation.

Authentication

All API requests require authentication using an API key. Include your API key in the Authorization header:

Authorization: Bearer YOUR_API_KEY

Submit Batch Job

Base URL

https://console.gmicloud.ai

Endpoint

POST /api/v1/ie/requestqueue/apikey/requests

Request Format

curl -X POST "https://console.gmicloud.ai/api/v1/ie/requestqueue/apikey/requests" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-batch",
    "payload": {
      "model": "gemini-3-flash-preview",
      "input_data": "{\"request\":{\"contents\":[{\"role\":\"user\",\"parts\":[{\"text\":\"What is 2+2?\"}]}]}}\n{\"request\":{\"contents\":[{\"role\":\"user\",\"parts\":[{\"text\":\"What is the capital of France?\"}]}]}}"
    }
  }'

Request Parameters

Parameter	Type	Required	Description	Default	Constraints
`model`	enum	Yes	Target Gemini model for batch prediction.	gemini-3-flash-preview	See supported models below
`input_data`	string	Yes	JSONL content where each line is a request object. Can be raw JSONL string or base64 encoded.	-	Max 1GB file size, max 200K requests

Supported Models

Model	Description	Best For
`gemini-3-flash-preview`	Fast, efficient Gemini 3 Flash	High-volume, simple tasks
`gemini-3-pro-preview`	Most capable Gemini 3 Pro	Complex reasoning tasks
`gemini-3-pro-image-preview`	Gemini 3 Pro with image generation	Batch image generation

JSONL Input Format

Each line must be a valid JSON object with a request field containing contents: Text-only requests:

{"request":{"contents":[{"role":"user","parts":[{"text":"What is the capital of France?"}]}]}}
{"request":{"contents":[{"role":"user","parts":[{"text":"Summarize quantum computing in 3 sentences."}]}]}}
{"request":{"contents":[{"role":"user","parts":[{"text":"Write a haiku about mountains."}]}]}}

Multimodal requests (with images):

{"request":{"contents":[{"role":"user","parts":[{"text":"Describe this image"},{"file_data":{"file_uri":"gs://bucket/image.jpg","mime_type":"image/jpeg"}}]}]}}

Response (Immediate)

After submitting, you receive a request_id to track the job:

{
  "request_id": "7eaa77fc-bc67-4021-9f1b-96b3fd832314",
  "model": "gemini-batch",
  "status": "processing",
  "outcome": {
    "batch_job_state": "JOB_STATE_QUEUED"
  },
  "created_at": 1761763441,
  "updated_at": 1761763441
}

Check Job Status

Endpoint

GET /api/v1/ie/requestqueue/apikey/requests/{request_id}

Example

curl -X GET "https://console.gmicloud.ai/api/v1/ie/requestqueue/apikey/requests/7eaa77fc-bc67-4021-9f1b-96b3fd832314" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response (In Progress)

{
  "request_id": "7eaa77fc-bc67-4021-9f1b-96b3fd832314",
  "model": "gemini-batch",
  "status": "processing",
  "outcome": {
    "batch_job_state": "JOB_STATE_RUNNING"
  },
  "created_at": 1761763441,
  "updated_at": 1761764441
}

Response (Completed)

{
  "request_id": "7eaa77fc-bc67-4021-9f1b-96b3fd832314",
  "model": "gemini-batch",
  "status": "success",
  "outcome": {
    "batch_job_state": "JOB_STATE_SUCCEEDED",
    "output_url": "https://storage.googleapis.com/.../predictions.jsonl",
    "output_download_urls": [
      "https://storage.googleapis.com/.../predictions-00000-of-00001.jsonl"
    ],
    "batch_job_completion_stats": {
      "successful_count": "100",
      "failed_count": "2"
    },
    "token_usage": {
      "total_prompt_tokens": 5000,
      "total_candidates_tokens": 8000,
      "successful_requests": 100,
      "failed_requests": 2
    },
    "actual_cost_usd": "$0.001234"
  },
  "created_at": 1761763441,
  "updated_at": 1761765441
}

Download and Parse Output

The output is a JSONL file where each line corresponds to one input request:

{
  "status": "",
  "processed_time": "2024-01-15T10:30:00.000+00:00",
  "request": {"contents": [{"role": "user", "parts": [{"text": "What is 2+2?"}]}]},
  "response": {
    "candidates": [{
      "content": {"parts": [{"text": "4"}], "role": "model"},
      "finishReason": "STOP"
    }],
    "usageMetadata": {
      "promptTokenCount": 5,
      "candidatesTokenCount": 1,
      "totalTokenCount": 6
    }
  }
}

Note: An empty status field indicates success. Failed requests will have an error message in status.

List Your Batch Jobs

Endpoint

GET /api/v1/ie/requestqueue/apikey/requests?model_id=gemini-batch

Example

curl -X GET "https://console.gmicloud.ai/api/v1/ie/requestqueue/apikey/requests?model_id=gemini-batch" \
  -H "Authorization: Bearer YOUR_API_KEY"

Request Status Values

Status	Description
`queued`	Job is waiting to be submitted to Vertex AI
`processing`	Batch job is running (may take minutes to hours)
`success`	Job completed (check `batch_job_completion_stats` for details)
`failed`	Job failed (check error message in outcome)

Batch Job States (Vertex AI)

State	Description
`JOB_STATE_QUEUED`	Waiting for resources
`JOB_STATE_PENDING`	Job is being prepared
`JOB_STATE_RUNNING`	Processing requests
`JOB_STATE_SUCCEEDED`	All requests completed
`JOB_STATE_PARTIALLY_SUCCEEDED`	Some requests failed
`JOB_STATE_FAILED`	Job failed
`JOB_STATE_CANCELLED`	Job was cancelled (e.g., 24h timeout)

Pricing

Model	Online Price	Batch Price (50% off)
Gemini 3 Flash	$0.10/1M input,$ 0.40/1M output	$0.05/1M input,$ 0.20/1M output
Gemini 3 Pro	$1.25/1M input,$ 5.00/1M output	$0.625/1M input,$ 2.50/1M output
Gemini 3 Pro Image	$0.0011/input img,$ 0.134/output img	$0.00055/input img,$ 0.067/output img

Limits

Limit	Value
Max input file size	1 GB
Max requests per job	200,000
Max processing time	24 hours (after job starts running)
Max queue time	72 hours

Tips for Best Results

Batch Size: Ideal for 100+ requests. For fewer requests, consider online inference.
File References: Use gs:// URIs for images/documents stored in Google Cloud Storage.
Processing Time: Jobs typically complete within minutes to hours depending on volume.
Cost Optimization: Use batch for non-time-sensitive workloads to save ~50%.
Error Handling: Check batch_job_completion_stats for failed request counts.
Partial Results: Even if some requests fail, successful ones are still billed and available.

​Gemini Batch Inference API Usage Guide

​Overview

​Authentication

​Submit Batch Job

​Base URL

​Endpoint

​Request Format

​Request Parameters

​Supported Models

​JSONL Input Format

​Response (Immediate)

​Check Job Status

​Endpoint

​Example

​Response (In Progress)

​Response (Completed)

​Download and Parse Output

​List Your Batch Jobs

​Endpoint

​Example

​Request Status Values

​Batch Job States (Vertex AI)

​Pricing

​Limits

​Tips for Best Results

Gemini Batch Inference API Usage Guide

Overview

Authentication

Submit Batch Job

Base URL

Endpoint

Request Format

Request Parameters

Supported Models

JSONL Input Format

Response (Immediate)

Check Job Status

Endpoint

Example

Response (In Progress)

Response (Completed)

Download and Parse Output

List Your Batch Jobs

Endpoint

Example

Request Status Values

Batch Job States (Vertex AI)

Pricing

Limits

Tips for Best Results