On This Page
- The week the market said it out loud
- What we built
- The astonishing forecast spread
- What Hyperscience's own TCO model says
- What the wrapper actually contains
- What April and May 2026 told us about who knows
- What practitioners told us, again
- The answer to the question
- What does not change
- What this means if you are buying
- Caveats
The week the market said it out loud
On May 12, 2026, at Inspire 2026 in Las Vegas, Coupa announced it had acquired Rossum, calling Rossum "the AI-first market leader in intelligent document processing". Coupa is a spend-management platform. Rossum is, or was, an independent IDP company built on a proprietary transactional large language model trained on tens of millions of documents. Coupa first embedded Rossum in its accounts payable infrastructure as a partner in 2024. Two years later, partner became absorbed. (Coupa newsroom; Rossum press release; Fintech Futures.)
This is the same pattern we tracked in February when UiPath absorbed WorkFusion: a vertical workflow platform buys a horizontal IDP vendor. It is the ninth such deal we have counted since 2023. The deal that should worry the rest of the IDP market is this one, because Rossum was not sub-scale. Rossum was the AI-first leader. The market just told itself out loud that even the leader of the standalone IDP category is more valuable inside a vertical platform than alongside one.
At the same time, in our own folder, a hotfolder pattern is processing personal mail more reliably than the IDP platform we evaluated for the same job in 2024 at €18,000 a year. Two hundred lines of PowerShell. Six hundred words of prose in a markdown file. A frontier model in the middle. No template trainer, no annotation UI, no rule engine. The whole thing fits in a folder we could email.
The middle of the IDP market is being eaten from two sides in the same week. The leaders are being absorbed up into vertical platforms. The long tail is being absorbed down into a markdown file plus a model. The question this post tries to answer: if a single person can build the loop in an afternoon, and the category leader sells itself to a procurement platform, what is the IDP market actually selling? The analyst forecasts give the answer in negative space.
Gartner's market guide puts the IDP market at $2.09 billion by 2026 with a 13% CAGR. Coherent Market Insights puts it at $1.45 billion in 2026 growing at 4.9% to $2.02 billion by 2033. Fortune Business Insights puts it at $14.16 billion in 2026 growing at 26.2% CAGR to $91.02 billion by 2034. Research Nester puts it at $4.1 billion in 2026.
These four forecasts cannot all describe the same market. They are off by an order of magnitude. That spread is the story. The narrow definitions ($1.45B to $4B, single-digit growth) describe the extraction loop the hotfolder replaces. The broad definitions ($14B+, double-digit growth) describe the wrapper: orchestration, compliance posture, integration depth, audit trail, human review queue, SLA, certification stack. The loop is becoming a commodity. The wrapper is the market.
What follows is what is in the wrapper, what is not, what April and May 2026 vendor moves tell us about who has accepted the shift, and what we found on our own laptops and on engineering forums while we watched it.
What we built
A folder on a Windows laptop. A markdown file next to it written like a household memo. A watcher script that picks up new files and hands them to a language model with the markdown file as context. The model decides what each file is, extracts what the prose says to extract, renames the file to YYYY-MM-DD_Type_Recipient_Sender_YYYYMMDD_HHMMSS.<ext>, and moves it into done\.
Two hundred lines of PowerShell for the watcher. Six hundred words in the markdown file. A scheduled task that restarts the watcher on logon and on wake from sleep. The whole system fits in a folder we could email.
Three weeks of running it on personal mail. Eight hundred and forty scanned items: tax notices, doctors' letters, insurance correspondence, utility bills, two contracts, a stack of receipts, one handwritten birthday card. Three months of backlog cleared in about six hours of model time.
Eight items came back wrong on the first pass. Three were the same failure: a doctor's letter with both an issue date and a dictation date, and the model picked the dictation date. The instruction file had not said which to prefer. We added a sentence. The failure stopped. Two were a different failure: an XRechnung-style invoice where the visible PDF said one date and the embedded XML said another. We added a line. The failure stopped. One was a delivery note that looked exactly like the supplier's invoice template. We added two examples. The failure stopped. The remaining two were genuinely ambiguous (a fax with bad contrast, a payment reminder with two unlabeled dates) and a human would have flagged them too.
The iteration loop was "watch what fails, add a sentence to the markdown file, watch fewer things fail". Total time across three weeks: about forty minutes. The same iteration on a commercial IDP platform we evaluated in 2024 took three days of professional services per new document type, billed at the customary rate.
We are not selling this. We do not sell software. The reason it is worth writing about is that the same pattern, with industrial guardrails, is exactly what several IDP vendors shipped in April. The personal build is the cheapest possible evidence that the loop is no longer the moat.
The astonishing forecast spread
Look again at the analyst numbers. Gartner: $2.09B by 2026. Coherent: $1.45B in 2026, growing at 4.9% (slower than inflation in some quarters). Fortune Business Insights: $14.16B in 2026, growing at 26.2% to $91B by 2034. Research Nester: $4.1B in 2026.
A factor of ten between the highest and lowest credible number for the same market in the same year. This does not happen in mature categories. CRM forecasts do not disagree by ten times. Database forecasts do not. The reason this one does is that "intelligent document processing" is being redefined in real time, and the redefinition is the whole point.
The narrow forecasts count extraction software licenses. The broad forecasts count anything in the value chain: orchestration platforms, ECM modules, RPA seats with document modules attached, vertical workflow software that does extraction as a feature, integration spend, professional services, managed services, certified compliance posture. The spread between the narrow and broad numbers is roughly the wrapper. If you accept the narrow definition, the market is small, slow, and being eaten by frontier models. If you accept the broad definition, the market is large, fast, and consolidating around vendors who own the wrapper.
Both can be true at the same time. They are describing different layers of the same stack.
What Hyperscience's own TCO model says
The most honest pricing artifact we found in April was the Hyperscience build-vs-buy blog post. It is marketing. It is also specific.
The published model puts Hyperscience Hypercell at $153,639 per year in recurring platform fees, plus $100,000 in one-time implementation, for a five-year NPV of $682,413. The build alternative, a hyperscaler DIY pipeline, is put at $2,275,442 over the same five years, mostly technical labor ($1.87M PV) plus infrastructure ($409K PV). The pitch is 272% ROI versus DIY and payback under six months. Performance claims are 99.5% accuracy and 98% automation.
Take the vendor's own numbers at face value for a moment. They are saying: pay $683K over five years to avoid $2.3M in labor and infrastructure to build it yourself. The implicit acknowledgment is that the build is now possible. The pitch is no longer "we have a model you cannot replicate". The pitch is "we have a model, plus the orchestration, plus the platform, plus the team you would have to hire to run the loop in production". The product is the wrapper.
This is the right move. It is also a tell. The vendor that frames the buying decision as build-versus-buy has accepted that the underlying capability can be built. Five years ago that framing would have been a category error.
For context on the API alternative: Google Document AI is $1.50 per 1,000 pages, dropping to $0.60 per 1,000 after 5 million pages a month, plus $36 per month per deployed processor (Google Cloud pricing). At one million pages a year, that is $1,500 plus processor fees, comfortably under $5,000 a year, against $153,639 for the Hyperscience platform fee. The gap is the wrapper. The wrapper is where the orchestration, the compliance, the human review queue, and the certifications live.
The other context worth knowing: 68% of AI projects exceeded their initial budget in 2026 by an average of 42%, with data quality (54% of projects) and legacy integration (48%) as the most-cited causes (CloudZero AI cost guide). The DIY option is cheaper on paper and more expensive in practice. The professional services component is where most enterprises lose the budget they thought they were saving.
What the wrapper actually contains
If the extraction loop is becoming free, the wrapper is what people are paying for. We went looking for what is in it. Across April 2026 vendor announcements, the recurring components are:
Certifications and audited posture. SOC 2 Type II, ISO 27001, ISO 42001 (the new AI management standard), HIPAA, PCI DSS, GDPR readiness packs, IRAP for Australian government, FedRAMP for US federal. A vendor with a full stack of audited certifications can be procured in a regulated industry without a six-month security review. A markdown-file hotfolder cannot. This is real, not theatrical. Compliance teams will spend $683K a year to avoid eight months of HIPAA work that practitioners on engineering forums described in our April report.
Integration depth. Affinda's spring 2026 release pairs its agentic platform with an Integration Agent that connects to 2,800+ downstream systems through natural-language instructions, with persistent model memory that applies corrections via RAG rather than retraining (Affinda announcement). That is not the extraction loop. That is the connectors, the connector maintenance, the schema mappings, the credentialing, and the change management. The loop reads the file. The wrapper makes sure the result lands in the right field of the right record in the right system, on the right schedule, with the right access controls.
Confidence scoring and exception queues. Per-field confidence scores, threshold routing, four-eyes review queues, audit logs of who reviewed what and when. The hotfolder pattern is honest about failure in a way that does not scale: mistakes land in a folder and you look at them. Production IDP routes low-confidence extractions to a reviewer with a logged audit trail. None of this is glamorous. All of it is regulated workflow load-bearing.
Orchestration and human-in-the-loop. Hyperscience's Hypercell Spring 2026 release is explicit about this. Inference layering routes high-volume straightforward transactions to CPU-based models and shifts to GPU-based VLMs for documents that need reasoning. The release supports NVIDIA Blackwell, Nemotron 3, Gemini 1.5 Flash, and Gemini 2.5 Pro as configurable backends. The customer chooses the model. Hyperscience runs the workflow. The product is the orchestration around the model, not the model.
SLAs and managed service. A markdown-file hotfolder breaks when Windows updates restart the laptop. A managed IDP service is on the hook for 99.9% uptime, regional failover, encrypted-at-rest storage, and a phone number to call. Practitioner reviews of Hyperscience on Gartner Peer Insights average 4.4 stars with specific praise for support and specific complaints about documentation. Even the leader has friction. The friction is what you pay for.
Vendor master data and domain logic. Invoice posting against a master vendor list. Contract analysis against a contract taxonomy. Medical record extraction against a coding system. Reducto, Rossum, and ABBYY's accuracy and validation tooling against per-field rules is not what the hotfolder pattern does. It is what extraction looks like when the consequence of a wrong field is a regulator's letter.
The wrapper is most of the bill. The wrapper is also most of what new entrants struggle to build.
What April and May 2026 told us about who knows
The vendor moves are easier to read once you accept that the loop is commoditizing and the wrapper is the moat.
Coupa acquired Rossum on May 12, 2026 (Coupa newsroom; PR Newswire). The deal was announced at Inspire 2026 in Las Vegas. Coupa's CEO Leagh Turner described the rationale as extending IDP capabilities "across the entire Coupa portfolio enabling autonomous spend management with agentic AI". Rossum was Coupa's first acquisition of 2026, on the heels of last year's Scoutbee (supplier discovery) and Cirtuo (category management) buys. Terms were not disclosed. Industry coverage at Deep Analysis framed it as confirming the trajectory: the document layer is the missing piece of vertical spend platforms, and the leader of the horizontal IDP category just sold itself to a vertical. Rossum's transactional LLM and twenty million documents of training data are now Coupa's. This is the loudest signal the IDP category has given itself in 2026. The independent leader of "AI-first IDP" decided it was worth more inside a procurement platform than alongside one.
Hyperscience repositioned away from "IDP" to "intelligent inference" in its Spring 2026 release on April 7. The framing is "bring your model, we run the workflow". The Spring 2026 webinar runs May 14. They are moving up the stack while there is time. Read against the Coupa-Rossum deal eight days earlier, the move looks defensive in the best sense: Hyperscience does not want to be Rossum.
Affinda stopped selling discrete extractors per document type and started selling one agent with persistent memory and 2,800 connectors. The instruction surface is shrinking. The integration surface is the product. The connectors are the bet.
Sensible continued building agentic document workflows where extraction is one tool among several an agent uses. The agent reads the document, decides what to extract, calls the extraction primitive, validates, retries, escalates. Retries and observability are what they sell.
Cambrion, the Munich startup we welcomed in February, is the purest expression of the loop at the model layer: zero-shot vision-language processing, no OCR step, no template trainer. They are still small. The architectural argument is the one the hotfolder makes.
IBM Docling keeps gaining adoption as the open-source layout parser underneath everything. Docling plus a frontier model is the hybrid pipeline practitioners have been describing for nine months. It runs equally well behind a watched folder, behind an SDK, and behind a Hyperscience workflow. Everyone uses it. Nobody credits it. That is what real infrastructure looks like.
ABBYY is now the largest independent IDP vendor by some readings, post-Rossum. Their template trainers and per-field accuracy tooling still set the bar for high-volume regulated extraction. For invoices at scale, with vendor master matching and SAP posting, the specialized extractor with thirty years of tuning is hard to beat. The question is how long ABBYY stays independent and what acquirer makes sense if they do not. Vertical platforms with deep document needs (ERP, healthcare RCM, legal CLM) are the obvious candidates.
UiPath and Hyland sit one floor above. UiPath's acquisition of WorkFusion in February and Hyland's reported 220% agentic adoption surge are the enterprise-scale version of the same move. The agentic orchestrator is the product. Document understanding is a callable capability inside it. Coupa-Rossum just put their pattern into the AP procurement vertical.
The vendors at risk are the middle-tier extraction platforms whose differentiation was the template trainer UI. The template trainer is what the markdown file replaces. The vendors who escape are the ones who get bought into a vertical or who reposition the orchestration as the product before the orchestrator buys them.
What practitioners told us, again
While the folder ran, we re-read the same engineering forums that fed our April practitioner report. The hotfolder pattern shows up more often now than it did six months ago.
A poster claiming to run a three-person accounting firm describes a watched folder on a shared drive that processes invoices into a fixed filename schema and posts them to DATEV. The configuration is a docstring. They wrote it themselves. They say it replaced a €450 per month commercial product.
A poster claiming to work in residential property management describes a folder per building, each with its own instruction file, processing utility statements and tenant correspondence. The per-building instruction files are forty lines and easier to keep accurate than the central rule set they replaced.
A poster claiming to be a sole-practitioner notary describes a hotfolder for incoming scans that produces a candidate filename and waits for human confirmation before moving. They describe it as "the assistant I cannot afford to hire".
A poster claiming to run an internal data team at a mid-sized German manufacturer describes building the same pattern as a proof of concept, then quietly putting it into production for a class of vendor correspondence that the company's commercial IDP platform did not handle well. They describe the procurement conversation as "uncomfortable" because the proof of concept was already in production and the contract renewal was in three months.
None of this is verified. The patterns are consistent with what we built. The forum posts we excluded are the ones that read as self-promotional, the ones from vendor accounts, and the ones that described the pattern but produced no specifics. What is left is a feed of practitioners quietly automating their long-tail document work with a model and a markdown file, and not telling their CIO about it.
That should worry the middle of the IDP market more than any single vendor announcement.
The answer to the question
If a single person can build the loop in an afternoon and the leader of the standalone IDP category just sold itself to a procurement platform, the IDP market is not selling the loop and it is not staying horizontal. It is selling certifications, integration depth, audit posture, orchestration, exception queues, SLAs, professional services that keep the wrapper running, vendor master domain logic that took twenty years to encode, and a path into a vertical platform. The 10x analyst forecast spread is the market telling itself this out loud. The Coupa-Rossum deal is the market doing it.
The narrow forecasts (Coherent's $1.45B, Gartner's $2.09B) are the loop. They are flat to declining. The broad forecasts (Fortune Business Insights' $14.16B in 2026, growing to $91B by 2034) are the wrapper. They are growing fast. The vendors moving in April and May from "extraction platform" to "agentic workflow" or "intelligent inference" or "embedded in spend management" are moving from the loop forecast to the wrapper forecast on purpose. The vendors still pricing the loop at six figures a year are pricing a product that is becoming free, and their customers are starting to notice on engineering forums.
We do not think extraction vendors disappear. We think the ones that survive sell what is in the wrapper or get absorbed into a vertical that needs it. The five years of NPV in Hyperscience's own TCO model is the wrapper. The 2,800 connectors in Affinda's Integration Agent is the wrapper. Coupa wrapping Rossum into spend management is the wrapper consuming the loop in public. The four-eyes review queue and the SOC 2 Type II report and the BAA template and the audit log are the wrapper. The model is interchangeable. The wrapper is not.
What does not change
The work is still there. Document understanding is still hard. Day-eleven failures still happen. Tables are still the hardest unsolved problem. The compliance work still takes longer than the build. The human reviewer is still in the loop. The knowledge layer is still unsolved. A folder that reads itself is not a folder that organizes a life.
The hotfolder pattern is not a thesis about the death of IDP. It is evidence about which floor of the building is collapsing into prose. The ground floor (the extraction loop) is. The upper floors (orchestration, compliance, integration, knowledge) are not, and the analyst forecasts that include those floors are the ones that grow.
The folder on the laptop will keep processing mail. It does not care which vendor wins.
What this means if you are buying
Ask what the configuration surface looks like. A vendor that needs three weeks of professional services per new document type is selling you 2022 IDP. A vendor whose configuration is a written specification you can edit is selling you the new pattern. Know which you are buying.
Ask where the orchestration lives. If your workflow is "drop a file, get a structured result, route it" and your document types are stable, a watched folder plus a model is credible architecture. If your workflow is "validate against a master vendor list, post to SAP, route exceptions to a four-eyes queue, archive with retention metadata, surface in audit", the orchestration is the product and you should buy it from a vendor whose business is orchestration.
Ask about confidence and exceptions. A vendor without per-field confidence and exception queues is not production-ready for regulated extraction regardless of how good the demo looks.
Ask about tables. Still the differentiator. Test on your tables, not the vendor's.
Ask what runs where. The privacy divide is real. If your data has to stay in a jurisdiction, the model behind the hotfolder can run there and several agentic IDP platforms now run there. Ask, do not assume.
Ask what would have to be true for the vendor's pricing to make sense if their configuration surface could be a markdown file. If the honest answer is "nothing", that is your answer. If the honest answer is "the certifications, the connectors, the audit posture, the SLA, the people who keep it running, the four-eyes queue, and the vendor master domain logic", that is also an answer, and it is a defensible one.
Caveats
We built one hotfolder. It processes one household's mail. The corpus is small. The compliance surface is sensible-filing-cabinet, not HIPAA and not SOX. The model in the middle is a frontier-class commercial model running with consent on our own data. Our privacy posture is not yours.
We read engineering forums. Forum posts are forum posts. We included claims that appeared across multiple independent discussions and excluded threads that read as self-promotional. Specific numbers are directional. The 10x analyst forecast spread is verifiable and we have linked the sources. Hyperscience's TCO numbers are from their own marketing and should be read as such; they are still a useful artifact of how the vendor is repricing what it sells.
We are not claiming that the IDP middle disappears. We are claiming that the configuration surface in the middle collapses into prose, the loop becomes a commodity, and the vendors who survive sell the wrapper. The April releases from Hyperscience, Affinda, Sensible, and Cambrion suggest the vendors who know this is happening are responding. The vendors whose April release reads like 2023 are the ones not to evaluate in 2026.
The folder on the laptop will keep processing the mail. We will keep writing about what we see next to it.
Sources cited in this post: Coupa newsroom on Rossum acquisition; PR Newswire on Coupa-Rossum; Rossum press release; Fintech Futures on Coupa-Rossum; Deep Analysis on Coupa-Rossum; Gartner Market Guide for IDP via ABBYY; Coherent Market Insights IDP forecast; Fortune Business Insights IDP forecast; Research Nester IDP forecast; Hyperscience build-vs-buy TCO blog; Hyperscience Hypercell Spring 2026 release on BusinessWire; Hyperscience Hypercell reviews on Gartner Peer Insights; Affinda agentic platform launch; Google Cloud Document AI pricing; CloudZero AI cost guide 2026. Cross-references to our own coverage: April 2026 practitioner report, February 2026 vendor consolidation report, capability pages, independent vendor profiles. We do not sell IDP software. We are not paid by any vendor mentioned in this post.