Google Document AI — IDP Platform on Vertex AI

On This Page

Overview
What users say
Google Document AI video
How Document AI handles extraction
Use cases
Healthcare revenue cycle and clinical workflows
Enterprise document processing
Cloud infrastructure for specialized applications
Technical specifications
Pricing
Competitive positioning
Resources
Company information

Google Document AI is an API-first intelligent document processing (IDP) service built on Vertex AI, using Gemini 3 language models for extraction, classification, and layout parsing. It targets developers and enterprises already standardized on Google Cloud, offering usage-based pricing and deep integration with the broader Vertex AI infrastructure.

Google

Overview

Google Document AI is not a standalone IDP platform in the traditional sense. It is an extraction and document understanding service that organizations connect to their own workflow orchestration, human-in-the-loop review, and case management tooling. This developer-first architecture suits enterprises building custom pipelines on Google Cloud but creates friction for teams seeking a unified, out-of-the-box IDP solution.

The platform has accelerated its model cadence significantly in early 2026. Between January and April 2026, Google released Gemini 3 Flash and Pro-powered custom extractors in preview, launched custom splitter v1.5 at general availability (GA), and introduced fine-tuning upgrade paths from v1.4 to v1.5 models. Template-based processors achieve over 95% precision and recall on structured documents such as invoices, with confidence scores available for field-level validation in production.

Healthcare is the clearest proof point for Document AI at scale. At HIMSS 26 in March 2026, Google Cloud announced deployments with CVS Health, Highmark Health, Waystar, Quest Diagnostics, and Humana, covering revenue cycle management, lab result interpretation, and clinical workflow automation. These deployments use Document AI's extraction layer as the foundation for downstream agentic decision-making.

95%+Precision/recall on structured docs

$15BDenied claims prevented by Waystar AltitudeAI

6MAnnual prompts on Highmark's Sidekick platform

$27.9MAI-enabled value delivered by Highmark in 2025

What users say

Practitioners consistently describe Document AI as accurate and well-integrated within Google Cloud, but note the platform requires significant additional engineering to reach production-grade IDP workflows. Teams report that the extraction quality on structured documents like invoices and forms is strong, while unstructured document handling has improved materially with Gemini-powered models. The absence of built-in workflow orchestration, human-in-the-loop review, and case management is the most common friction point cited by evaluators comparing Document AI against end-to-end IDP platforms.

Organizations already running workloads on Google Cloud find the IAM integration, VPC service controls, and regional data residency controls straightforward to configure. Teams coming from other cloud environments or seeking a vendor-managed IDP experience tend to find the setup overhead significant relative to alternatives that bundle orchestration with extraction.

Google Document AI video

This video demonstrates Google Document AI in action, covering the platform's extraction, classification, and layout parsing capabilities through Vertex AI.

Click to load video from YouTube

How Document AI handles extraction

Google Document AI routes documents through specialized processor models depending on document type and processing requirement. As of early 2026, the platform offers three model tiers for custom extraction: Gemini 3 Flash (120 pages per minute quota), Gemini 3 Pro (30 pages per minute), and legacy v1.x models scheduled for deprecation on June 30, 2026.

Custom extractor models powered by Gemini 3 Flash and Pro entered preview on January 27, 2026, with ML processing available in US and EU regions. The Flash variant prioritizes throughput for high-volume workloads; the Pro variant trades speed for accuracy on complex or ambiguous documents. Both support document-level prompting, which launched January 12, 2026 and allows organizations to inject business context at the document level to improve field extraction accuracy without retraining.

Layout parsing received parallel upgrades. Gemini 3 Flash and Pro layout parser models entered preview on February 9 and January 27, 2026 respectively, improving table recognition and reading order on PDF files. A web interface for the layout parser entered preview February 16, 2026, adding document view display, JSON visualization, and image and table annotation directly in the console.

For document splitting and classification, custom splitter v1.5 reached GA on March 27, 2026, offering zero-shot splitting, classification, and confidence scores without requiring labeled training data. Custom classifier and splitter v1.6 entered preview on March 23, 2026, powered by Gemini 3 Flash and Pro, continuing the platform's shift away from template-based approaches toward zero-shot and few-shot extraction.

Fine-tuning is now iterative. The custom extractor fine-tuning upgrade capability launched March 31, 2026, allowing users to upgrade from v1.4 to v1.5 models while preserving prior configurations, reducing the cost of adopting newer model versions for teams with established training datasets.

Use cases

Healthcare revenue cycle and clinical workflows

Waystar's AltitudeAI, built on Google Cloud, has prevented more than $15 billion in denied claims and reduced appeal and documentation time by 90%. The system uses Document AI's extraction capabilities as the foundation for autonomous revenue cycle management decisions.

Highmark Health's Sidekick platform scaled from 1 million to 6 million prompts in just over one year, delivering an estimated $27.9 million in AI-enabled value for 2025. Allegheny Health Network, part of Highmark, uses a Sidekick-based workflow to automate the first draft of complex research protocols through an IRB protocol and consent builder. Richard Clarke, Chief Data and Analytics Officer at Highmark Health, stated in March 2026: "The next chapter is bringing multi-agent support to employees."

Quest Diagnostics launched Quest AI Companion, a HIPAA-compliant chat feature for lab result interpretation via the MyQuest app and portal. Humana deployed an Agent Assist solution using Gemini Enterprise for Customer Experience to support call center operations.

Enterprise document processing

For a hands-on walkthrough of Document AI configuration, see the Google Document AI guide and form recognition guide. Developers building LLM-powered extraction pipelines on top of Google's models may also find LangExtract, Google's open-source Python library for structured information extraction with source grounding, a useful companion tool.

Cloud infrastructure for specialized applications

Google Cloud provides backend infrastructure for applications requiring real-time data processing at scale. Teams evaluating open-source IDP platforms built on Google Cloud infrastructure may also consider Unstract, a no-code LLM platform for production-grade document extraction with hallucination mitigation. Organizations requiring video intelligence alongside document AI on the same cloud infrastructure may find VIDIZMO's enterprise video and document AI platform relevant, particularly for government and evidence management use cases.

Technical specifications

Component	Specification
Custom extractor models	Gemini 3 Flash (120 pages/min), Gemini 3 Pro (30 pages/min)
Layout parser	Gemini 3 Flash and Pro, PDF table recognition and reading order
Custom splitter	v1.5 GA (zero-shot), v1.6 preview (Gemini 3-powered)
Max file size (online)	40 MB (increased from 20 MB, June 2025)
Max pages (online/sync)	30 pages per request
Project limits	30,000 documents or 250,000 pages per project
Import limits	5,000 documents per import
Structured doc accuracy	95%+ precision and recall on invoices and forms
Vector Search	2.0 GA: hybrid search, auto-embeddings, PSC/PGA/VPC support
RAG Engine	Serverless mode public preview (April 3, 2026)
Regions (custom extractors)	US and EU (January 2026 onward)
Compliance	SOC 2, ISO 27001, HIPAA
Security	Encryption at rest and in transit, RBAC via IAM, SSO via Google Cloud Identity, IAM deny policies, VPC service controls with identity groups

Pricing

Enterprise OCR

$1.50 per 1,000 pages

Pages 1 to 5,000,000/month

$0.60 per 1,000 pages above 5M pages/month
Enterprise Document OCR Processor v2.1.1
Available in US, EU, and additional regions

Custom Extractors

$30 per 1,000 pages {primary}

Pages 1 to 1,000,000/month

$20 per 1,000 pages above 1M pages/month
Gemini 3 Flash and Pro models
Fine-tuning and document-level prompting included

Specialized Parsers

From $0.10 per 10 pages

Invoice, bank statement, and domain parsers

Invoice parser: $0.10 per 10 pages
Bank statement parser: $0.75 per classified document
Processor hosting: $0.05/hour per deployed version
Capacity reservation: $300 per extra page-per-minute/month

New Google Cloud customers receive $300 in credits valid for 90 days, applicable to Document AI usage.

The usage-based model creates cost predictability for low-to-medium volume workloads but becomes expensive at scale. Capacity reservation offers cost optimization for high-volume processing but requires upfront commitment. This contrasts with subscription-based competitors like UiPath and Automation Anywhere, which may offer better economics for organizations with unpredictable document volumes.

Competitive positioning

Google Document AI competes against UiPath, Automation Anywhere, Blue Prism, Laserfiche, and Kudra in the IDP vendor landscape. The core differentiator is also the core limitation: Google positions Document AI as an extraction and document understanding service, not an end-to-end workflow platform. Organizations must integrate additional tooling for human-in-the-loop review, case management, and downstream process automation.

This developer-first approach appeals to enterprises already standardized on Google Cloud infrastructure. It creates friction for organizations seeking unified platforms where extraction, review, and orchestration ship together. Unlike cloud-only competitors that bundle workflow management with extraction, Google's architecture assumes the buyer will assemble the pipeline.

The June 30, 2026 deprecation of legacy processors, including identity parsers, tax and finance parsers, mortgage and banking parsers, procurement processors, and summary processors, forces customers toward custom extractors and Gemini-based foundation models. The migration path is documented but operationally disruptive for teams with production workloads on legacy processors. Organizations with large fine-tuned custom extractor deployments face potential vendor lock-in as the platform consolidates around Gemini models.

Regional coverage remains narrower than competitors offering on-premises deployment. Gemini-powered custom extractors are available in US and EU regions as of January 2026. Enterprise Document OCR v2.1.1 added asia-southeast1, australia-southeast1, europe-west2, europe-west3, and northamerica-northeast1 in February 2025, but organizations requiring on-premises deployment for data residency compliance will need to evaluate alternatives.

Legacy processor deprecation: June 30, 2026. Identity parsers, tax/finance parsers, mortgage/banking parsers, procurement, splitting, and summary processors will be discontinued. Organizations running production workloads on these processors should begin migration to custom extractors or Gemini-based models before this date.

Resources

Google Vertex AI — unified AI platform for model training, deployment, and pipeline orchestration
Document AI release notes — official changelog covering model versions, capability launches, and deprecation timelines
Document AI limits reference — current file size, page, and project limits
Google Cloud Document AI — service overview, processor catalog, and quickstart guides