Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.gmicloud.ai/llms.txt

Use this file to discover all available pages before exploring further.

Model ID
gemini-batch

Gemini Batch Inference API Usage Guide

Overview

Gemini Batch Inference allows you to process large volumes of requests asynchronously at approximately 50% lower cost than online inference. Ideal for batch processing tasks like document analysis, image labeling, or bulk content generation.

Authentication

All API requests require authentication using an API key. Include your API key in the Authorization header:
Authorization: Bearer YOUR_API_KEY

Submit Batch Job

Base URL

https://console.gmicloud.ai

Endpoint

POST /api/v1/ie/requestqueue/apikey/requests

Request Format

curl -X POST "https://console.gmicloud.ai/api/v1/ie/requestqueue/apikey/requests" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-batch",
    "payload": {
      "model": "gemini-3-flash-preview",
      "input_data": "{\"request\":{\"contents\":[{\"role\":\"user\",\"parts\":[{\"text\":\"What is 2+2?\"}]}]}}\n{\"request\":{\"contents\":[{\"role\":\"user\",\"parts\":[{\"text\":\"What is the capital of France?\"}]}]}}"
    }
  }'

Request Parameters

ParameterTypeRequiredDescriptionDefaultConstraints
modelenumYesTarget Gemini model for batch prediction.gemini-3-flash-previewSee supported models below
input_datastringYesJSONL content where each line is a request object. Can be raw JSONL string or base64 encoded.-Max 1GB file size, max 200K requests

Supported Models

ModelDescriptionBest For
gemini-3-flash-previewFast, efficient Gemini 3 FlashHigh-volume, simple tasks
gemini-3-pro-previewMost capable Gemini 3 ProComplex reasoning tasks
gemini-3-pro-image-previewGemini 3 Pro with image generationBatch image generation

JSONL Input Format

Each line must be a valid JSON object with a request field containing contents: Text-only requests:
{"request":{"contents":[{"role":"user","parts":[{"text":"What is the capital of France?"}]}]}}
{"request":{"contents":[{"role":"user","parts":[{"text":"Summarize quantum computing in 3 sentences."}]}]}}
{"request":{"contents":[{"role":"user","parts":[{"text":"Write a haiku about mountains."}]}]}}
Multimodal requests (with images):
{"request":{"contents":[{"role":"user","parts":[{"text":"Describe this image"},{"file_data":{"file_uri":"gs://bucket/image.jpg","mime_type":"image/jpeg"}}]}]}}

Response (Immediate)

After submitting, you receive a request_id to track the job:
{
  "request_id": "7eaa77fc-bc67-4021-9f1b-96b3fd832314",
  "model": "gemini-batch",
  "status": "processing",
  "outcome": {
    "batch_job_state": "JOB_STATE_QUEUED"
  },
  "created_at": 1761763441,
  "updated_at": 1761763441
}

Check Job Status

Endpoint

GET /api/v1/ie/requestqueue/apikey/requests/{request_id}

Example

curl -X GET "https://console.gmicloud.ai/api/v1/ie/requestqueue/apikey/requests/7eaa77fc-bc67-4021-9f1b-96b3fd832314" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response (In Progress)

{
  "request_id": "7eaa77fc-bc67-4021-9f1b-96b3fd832314",
  "model": "gemini-batch",
  "status": "processing",
  "outcome": {
    "batch_job_state": "JOB_STATE_RUNNING"
  },
  "created_at": 1761763441,
  "updated_at": 1761764441
}

Response (Completed)

{
  "request_id": "7eaa77fc-bc67-4021-9f1b-96b3fd832314",
  "model": "gemini-batch",
  "status": "success",
  "outcome": {
    "batch_job_state": "JOB_STATE_SUCCEEDED",
    "output_url": "https://storage.googleapis.com/.../predictions.jsonl",
    "output_download_urls": [
      "https://storage.googleapis.com/.../predictions-00000-of-00001.jsonl"
    ],
    "batch_job_completion_stats": {
      "successful_count": "100",
      "failed_count": "2"
    },
    "token_usage": {
      "total_prompt_tokens": 5000,
      "total_candidates_tokens": 8000,
      "successful_requests": 100,
      "failed_requests": 2
    },
    "actual_cost_usd": "$0.001234"
  },
  "created_at": 1761763441,
  "updated_at": 1761765441
}

Download and Parse Output

The output is a JSONL file where each line corresponds to one input request:
{
  "status": "",
  "processed_time": "2024-01-15T10:30:00.000+00:00",
  "request": {"contents": [{"role": "user", "parts": [{"text": "What is 2+2?"}]}]},
  "response": {
    "candidates": [{
      "content": {"parts": [{"text": "4"}], "role": "model"},
      "finishReason": "STOP"
    }],
    "usageMetadata": {
      "promptTokenCount": 5,
      "candidatesTokenCount": 1,
      "totalTokenCount": 6
    }
  }
}
Note: An empty status field indicates success. Failed requests will have an error message in status.

List Your Batch Jobs

Endpoint

GET /api/v1/ie/requestqueue/apikey/requests?model_id=gemini-batch

Example

curl -X GET "https://console.gmicloud.ai/api/v1/ie/requestqueue/apikey/requests?model_id=gemini-batch" \
  -H "Authorization: Bearer YOUR_API_KEY"

Request Status Values

StatusDescription
queuedJob is waiting to be submitted to Vertex AI
processingBatch job is running (may take minutes to hours)
successJob completed (check batch_job_completion_stats for details)
failedJob failed (check error message in outcome)

Batch Job States (Vertex AI)

StateDescription
JOB_STATE_QUEUEDWaiting for resources
JOB_STATE_PENDINGJob is being prepared
JOB_STATE_RUNNINGProcessing requests
JOB_STATE_SUCCEEDEDAll requests completed
JOB_STATE_PARTIALLY_SUCCEEDEDSome requests failed
JOB_STATE_FAILEDJob failed
JOB_STATE_CANCELLEDJob was cancelled (e.g., 24h timeout)

Limits

LimitValue
Max input file size1 GB
Max requests per job200,000
Max processing time24 hours (after job starts running)
Max queue time72 hours

Tips for Best Results

  1. Batch Size: Ideal for 100+ requests. For fewer requests, consider online inference.
  2. File References: Use gs:// URIs for images/documents stored in Google Cloud Storage.
  3. Processing Time: Jobs typically complete within minutes to hours depending on volume.
  4. Cost Optimization: Use batch for non-time-sensitive workloads to save ~50%.
  5. Error Handling: Check batch_job_completion_stats for failed request counts.