Skip to content

Docugami

Document AI platform founded by XML co-creator Jean Paoli that transforms business documents into XML Knowledge Graphs using open-source LLMs for data sovereignty.

Docugami

Overview

Founded in 2017 by Jean Paoli, co-creator of XML, Docugami develops document AI that converts business documents into XML Knowledge Graphs using exclusively open-source LLMs. Based in Kirkland, Washington, the company has grown to 40 employees with $9.3M estimated annual revenue and 18% year-over-year growth. In 2025, Docugami established a European headquarters in France, targeting regulated sectors including insurance and healthcare with data sovereignty positioning. The company's patented Business Document Foundation Model uses hierarchical semantic chunking through Contextual Semantic Labels (CSLs) to transform documents into XML semantic trees for structural analysis beyond traditional text extraction. Docugami has received grants from the U.S. National Science Foundation, NASA, and Mitacs, with Gartner citing the company as an example of Generative AI innovation beyond traditional IDP.

Key Features

  • XML Knowledge Graphs: Patented system converts documents into structured Knowledge Graphs with all information preserved as actionable data nodes
  • Open-Source LLM Foundation: Uses exclusively open-source LLMs to address data sovereignty requirements in regulated sectors
  • Contextual Semantic Labels (CSLs): Hierarchical semantic chunking with deep contextual understanding of organizational terminology
  • XML Semantic Trees: Transforms documents into structural XML representations for analysis beyond flat text extraction
  • KG-RAG Architecture: Knowledge Graph-enhanced Retrieval-Augmented Generation outperforms standard RAG approaches
  • Document Engineering: Focuses on complex business document transformation rather than traditional digitization

Use Cases

Regulated Sector Compliance

European insurance and healthcare organizations deploy Docugami's open-source LLM approach to meet data sovereignty requirements. The XML Knowledge Graphs extract compliance-relevant information while maintaining data residency, enabling regulatory reporting without cloud-dependency concerns that affect traditional IDP vendors.

Commercial Insurance Document Processing

Insurance companies use Docugami to analyze complex policy documents, claims, and underwriting materials. The XML semantic trees capture hierarchical relationships between policy terms, coverage limits, and exclusions, enabling automated risk assessment and portfolio analysis across heterogeneous document types.

Legal teams leverage Docugami's document engineering approach to transform contract portfolios into structured data. The system's contextual understanding identifies non-standard clauses and cross-references obligations across multiple agreements, generating comparative analysis reports without manual review.

Technical Specifications

Feature Specification
Core Technology Patented Business Document Foundation Model, XML Knowledge Graphs
AI Architecture Exclusively open-source LLMs with agentic quality control
Chunking Method Hierarchical semantic chunking via Contextual Semantic Labels (CSLs)
Output Format XML semantic trees with actionable data nodes
Data Sovereignty Open-source foundation addresses regulated sector requirements
Geographic Operations Kirkland, WA headquarters; French subsidiary (2025)
Company Size 40 employees, $9.3M estimated revenue (18% YoY growth)
Funding $11.22M total funding, NSF/NASA/Mitacs grants
Technology Partners NVIDIA Inception program member

Resources

Company Information

Headquarters: Kirkland, Washington, United States

European Operations: French subsidiary launched 2025

Recognition: Gartner, NSF/NASA/Mitacs grants, NVIDIA Inception