On This Page
The month the price went to cents
In June 2026 the cost of reading a document with a machine stopped being a line item worth managing. Independent cost analysis puts direct Gemini Flash extraction at roughly $0.17 per 1,000 pages, against $1.50 for AWS Textract's basic OCR and $30 for Google's own legacy Document AI Form Parser. A developer with a stack of standard invoices no longer evaluates an intelligent document processing vendor. They paste the file into a model they already pay for and read the JSON that comes back.
The incumbent cloud platforms have conceded the point in their own release notes. Google's Document AI now routes its Layout Parser through Gemini 3 Flash and Gemini 3 Pro, with image and table annotations reaching general availability on May 27, and every legacy pre-2022 processor for identity, tax, mortgage and procurement documents deprecated effective June 30, 2026. The product that defined cloud OCR for a decade is now a wrapper around the same general model a developer can call directly. When the platform that invented the category retires its own purpose-built processors in favor of a chat model, the purpose-built layer is over.
The independent benchmarks agree. The Nanonets IDP Leaderboard, run across more than 9,000 real documents, has Gemini 3 Flash leading key information extraction at 91.1% and Gemini 3.1 Pro on top for OCR, tables and visual question answering. For the easy half of the market (clean invoices, printed forms, structured PDFs), the model is the pipeline. The loop the agentic hotfolder post described in May is now priced in cents and shipped by the people who make the models.
The labs are not selling extraction. They are selling the desk.
The repricing of OCR is the small story. The large one is that the frontier labs have stopped pitching chat and started pitching the knowledge-work desk: the same desk where document-heavy professional work lives.
OpenAI shipped Workspace Agents on April 22, the successor to Custom GPTs, with scheduled agents that connect to Google Drive, SharePoint, Box and Salesforce and improve through memory. The launch cited OpenAI's own internal use: 24,771 K-1 tax forms processed, weekly business reports automated. GPT-5.5, released two days later, is positioned in language that would have been a category claim for an IDP vendor a year ago: "creating documents and spreadsheets, operating software, and moving across tools until a task is finished." Then on May 27, OpenAI and Thrive put a self-improving tax AI into production with a network of thirty-plus accounting firms, reporting 97% accuracy on K-1 forms across a 7,000-return pilot. That is not a demo. That is document-heavy regulated professional work, automated, with a number attached.
Anthropic is making the same move from the other side. Claude Cowork, launched February 24, is sold as an "autonomous digital colleague" that reads and extracts key terms from DocuSign agreements, pulls structured data out of receipt images into Excel with formulas, and ships with ten finance agent templates for KYC file screening and pitchbook building. Anthropic's financial-services positioning claims an 83% accuracy on complex Excel tasks and a lead on the Vals AI Finance Agent benchmark. Sam Altman's framing for where this goes is explicit: agents trusted to handle "multi-day and multi-week tasks, operating proactively much like a senior human employee."
This is the threat the IDP middle has not priced. The model labs are not trying to be a better extractor. They are trying to own the agent that reads the document, reasons over it, and does the downstream work, the agentic workflow layer that vertical platforms like Coupa and UiPath also want. Extraction is a feature of that agent, not a product alongside it.
The plumbing went open in the same four weeks
While the labs took the bottom, the rest of the stack standardized the middle. The document format itself, the proprietary intermediate representation that several vendors treated as a moat, became an open standard in June.
On June 23, ABBYY shipped FineReader Engine 12.8 with export support for DocLang, and the Linux AI & Data Foundation announced DocLang as a vendor-neutral AI-native document standard co-founded by ABBYY, IBM, NVIDIA, Red Hat and HumanSignal. When five companies that compete with each other co-author the format their products emit, the format has stopped being a differentiator and become infrastructure. The same week, IBM made Docling for watsonx generally available at $4 per 1,000 pages, claiming 20% below comparable vendors, wrapping the open-source toolkit that already sits under half the pipelines practitioners describe. Docugami open-sourced DGML on June 17, paired with an Inveniam partnership to make extracted data verifiable at the individual data-point level rather than the document level.
The model layer commoditized on the same schedule. Mistral released OCR 4 on June 23: structure-aware, self-hostable in a single container, $2 to $4 per 1,000 pages, scoring 93.07 on OmniDocBench across 170 languages. Baidu open-sourced Unlimited-OCR on June 22, a 3-billion-parameter mixture-of-experts model under an MIT license. The reason every one of these announcements sells "trust" and "verifiable" and "AI-native" is the same reason the format is being standardized at all: an agent acting on extracted data needs a deterministic layer between the messy document and the probabilistic model. That layer is exactly the accuracy and knowledge problem we wrote about in April, and the industry has decided to solve it in the open.
The wall
Here is the part the lab marketing leaves out, and the part that should reassure anyone running hard documents for a living. The frontier models hit a wall, and the wall is measurable.
The same IDP Leaderboard that crowns Gemini on clean documents found handwriting OCR capped at 75.5% across every model tested, all three frontier labs included. Sparse and unstructured tables scored below 55% for most models. Chart analysis failed in specific, dangerous ways: "axis values misread by orders of magnitude, the wrong bar selected, off-by-one errors on closely spaced data points." Those are not edge cases. Those are bills of lading in logistics, handwritten claims forms in insurance, clinical notes and referral forms in healthcare, the regulated, high-stakes, high-variance documents that were never the easy half.
And on cost-adjusted accuracy the specialists still win where it counts. The leaderboard has Nanonets OCR-3 leading overall at 85.9%, ahead of Gemini 3.1 Pro at 83.2% and GPT-5.4 at 81.0%. On chart question-answering, the specialist Nanonets OCR2+ scores 87% to GPT-5.4's 77%, at $10 per 1,000 pages against $28. A purpose-built model beating a frontier model on the hard task at a quarter of the cost is not what a clean commoditization story predicts. The leaderboard's own conclusion is the line every buyer should keep: "the problems described are not model problems, they are architecture, integration, and trust problems."
The labs themselves quietly walked the timeline back. In late May, both Altman and Dario Amodei retreated from their most aggressive automation predictions, with Altman saying he was "delighted to be wrong" that entry-level white-collar work would already be gone. The frontier threat to the hard half of IDP is directional, not arrived. The bottom is being eaten now. The top is being promised.
Where the value actually went
If the loop is cents and the format is open, the money moves to the two ends that are not. It moved in exactly those directions this quarter.
Up, into agentic orchestration. Coupa followed its May 12 Rossum acquisition with a second deal nine days later, buying the workflow-intake startup Tonkean to sit on top of the document layer it had just absorbed. Down, into raw capacity: on June 24, PE-backed Daida acquired Scan-Optics, a firm founded in 1968 that runs a facility capable of four million pages a day, a roll-up of the unglamorous scanning and BPO tier that never depended on a model at all.
The surviving pure-play is doing what survival looks like. Hyperscience was the only vendor Forrester called "the only dedicated IDP pure play" in its Q2 2026 Wave, where it took the highest Current Offering score, and on June 26 it was named IDP Platform of the Year for the second year running. Forrester's own warning to buyers in that report is the whole market in one sentence: expect about six months to reach an MVP, not a few weeks. Meanwhile UiPath posted its first-ever GAAP-profitable quarter on May 28: $418M in revenue, $28M GAAP operating income, the discipline a platform finds when the model labs are pressing up from underneath.
What this means if you are buying
Test on your worst documents, not your best. The frontier models are excellent on the clean half and you should use them there. The question that decides your vendor is what happens on the handwriting, the sparse tables, and the bad scans. Run the OCR-versus-LLM trade-off on your own corpus before you believe anyone's leaderboard, including this one.
Price the easy half honestly. If your documents are standard and stable, a direct model call at cents per thousand pages is credible architecture and a six-figure platform contract for the same job is not. Know which half of the market you are in before you sign.
Ask where the agent lives. The labs, Coupa and UiPath all want to own the agent that acts on the document. If you are buying extraction, you are buying a feature of someone's agent. Decide whose agent, on purpose.
Buy the wall, not the loop. What you can defensibly pay for is the 24% the models cannot read, plus the validation, the audit trail, the exception queue, and the accountability around a wrong field. That is a real business. It is a smaller and more honest one than "we do documents."
Caveats
The cost figures are from independent analyses and vendor pages and are directional; naive frontier-model use on long multi-page PDFs can run $100 or more per 1,000 pages, so "cheap" depends entirely on the architecture around the call. The accuracy numbers are from the public IDP Leaderboard and the labs' own benchmark claims, and the labs' claims should be read as the marketing they are. The June M&A is sourced to primary releases. We do not sell IDP software and we are not paid by any vendor named here.
The model that reads the clean invoice keeps getting cheaper and better. The handwritten claims form does not care.
Sources cited in this post: Parsli LLM/OCR cost analysis; Google Document AI release notes; Nanonets IDP Leaderboard 1.5 and leaderboard details; OpenAI Workspace Agents; OpenAI GPT-5.5; OpenAI + Thrive tax AI; Anthropic Claude Cowork and Claude for Financial Services; Altman on agentic tasks and Altman/Amodei walkback; ABBYY DocLang; IBM Docling for watsonx; Docugami DGML; Mistral OCR 4; Baidu Unlimited-OCR; Coupa acquires Tonkean; Daida acquires Scan-Optics; Forrester Wave Document Mining Q2 2026; Hyperscience IDP Platform of the Year; UiPath Q1 FY2027 results. Cross-references to our own coverage: the agentic hotfolder and the end of the IDP middle, the May 12 Coupa-Rossum deal, the April accuracy reckoning, capability pages, and independent vendor profiles. We do not sell IDP software. We are not paid by any vendor mentioned in this post.