PaperQA Nemotron — Scientific Document AI Platform
On This Page
- Overview
- What users say
- Production deployment and benchmarks
- How PaperQA Nemotron processes documents
- Technical architecture
- Use cases
- Academic research acceleration
- Autonomous AI scientist platforms
- Literature review automation
- Version history and recent changes
- Open source components
- Market position
- Resources
- Company information
PaperQA Nemotron is an open-source intelligent document processing (IDP) platform combining Future House's PaperQA2 retrieval-augmented generation (RAG) engine with NVIDIA's Nemotron Parse vision-language model. It targets scientific research automation: processing PDFs, extracting equations, tables, and figures, and synthesizing answers across large literature collections with grounded citations.
Overview
PaperQA2 is engineered for high-accuracy RAG on scientific documents. Where general-purpose IDP platforms handle invoices, contracts, and forms, PaperQA Nemotron optimizes for academic literature: citation traversal, journal quality scoring, retraction checks, and multi-step reasoning across hundreds of papers simultaneously.
The platform's agentic architecture averages more than 4 tool calls per question and uses citation traversal on approximately 46% of queries, according to Future House's engineering analysis. That iterative retrieval pattern, with 1.26 searches per question for self-correction, distinguishes it from single-pass extraction pipelines used by most IDP vendors.
Edison Scientific deployed PaperQA Nemotron as the document processing backbone for Kosmos, an autonomous AI scientist platform now serving more than 50,000 users across universities, national laboratories, and healthcare organizations. James Braza, AI Research Technical Staff at Edison Scientific, stated in February 2026: "PDF extraction was a much larger bottleneck than initially realized. Switching to Nemotron Parse was a game changer — it unlocked the whole multimodal pipeline."
What users say
Practitioners deploying PaperQA Nemotron in research environments consistently highlight the citation grounding as the platform's strongest differentiator. Teams report that answers link directly to specific passages in source papers, reducing the hallucination risk that makes general-purpose LLMs unreliable for scientific work.
The agentic workflow draws mixed reactions. Researchers processing large literature collections value the citation traversal and self-correction behavior. Teams with simpler extraction needs find the 4+ tool-call overhead unnecessary for straightforward document parsing tasks, and note that the Python 3.11+ requirement and 24 GB VRAM minimum for local Nemotron Parse deployment create infrastructure barriers for smaller institutions.
Users comparing PaperQA Nemotron to general IDP platforms note that the platform's vertical focus on scientific documents means it lacks the pre-built connectors and workflow templates that horizontal vendors offer for finance and legal use cases. For research automation specifically, practitioners report it outperforms general-purpose alternatives on literature synthesis tasks.
Production deployment and benchmarks
Edison Scientific's Kosmos deployment establishes the clearest production baseline for PaperQA Nemotron at scale. The platform supports 100+ concurrent production runs with parallelized page processing. A 300-page thesis parses in approximately 30 seconds. During benchmark runs with 34 concurrent questions, the infrastructure processed 20 to 60 pages per second, with 99.9% of requests experiencing queue times under 2 seconds.
The LABBench2 benchmark, released in February 2026 to evaluate AI systems on practical biology research tasks, provides the most specific performance data available. Literature processed with Nemotron Parse achieved 34% accuracy on FigQA2 (figure understanding) compared to 19% for text-only rule-based parsers. Nemotron Parse improved figure understanding by 15%, table understanding by 7%, and text understanding by 3% versus rule-based extraction.
For financial document processing, Justt reported a 25% reduction in extraction error rate using Nemotron for chargeback analysis, demonstrating that the platform's accuracy gains extend beyond scientific documents to structured financial data.
Future House's WikiCrow research reports that PaperQA2 outperforms PhD and postdoc-level biology researchers on LitQA2 benchmarks. WikiCrow, a downstream application built on PaperQA2, produces Wikipedia-style summaries that blinded expert evaluators rated more accurate than actual Wikipedia articles. ContraCrow, another agent built on the platform, identified an average of 2.34 contradicted statements per paper across random biology papers.
By integrating the NVIDIA Nemotron Parse model into its PaperQA pipeline, Edison can decompose research papers, index key concepts and ground responses in specific passages, improving both throughput and answer quality for scientists.
NVIDIA Nemotron Labs blog, February 2026
How PaperQA Nemotron processes documents
The processing pipeline starts with document ingestion across PDFs, DOCX, TXT, and source code files. Nemotron Parse, a 900M-parameter vision-language model (VLM), handles the extraction layer. It classifies content across 13 structural document elements and produces markdown-formatted output with tables rendered in LaTeX, preserving the structure that standard PDF parsers destroy by merging columns and rows.
After extraction, the agentic RAG layer takes over. The system searches for relevant papers, gathers evidence passages, and generates answers with in-text citations. Citation traversal tools query both forward citations (papers that cite the source) and backward citations (papers the source cites) via Semantic Scholar and Crossref APIs. This bidirectional traversal increases paper recall and answer accuracy on complex research questions.
Metadata aggregation runs in parallel, pulling from four sources: Crossref, Semantic Scholar, Unpaywall, and OpenAlex. Citation counts, journal quality scores, and retraction status enrich the embedding space, so retrieval ranks papers by credibility, not just semantic similarity. LLM-based re-ranking with contextual summarization (RCS) then selects the most relevant passages before answer generation.
The platform supports all LiteLLM-compatible models, including OpenAI, Claude, Gemini, and locally hosted models via llama.cpp and Ollama. Default configuration uses OpenAI gpt-4o-2024-11-20 for generation and text-embedding-3-small for embeddings, with hybrid sparse/dense embedding support and local sentence-transformer models as alternatives.
Technical architecture
| Component | Specification |
|---|---|
| Document ingestion | PDF, DOCX, TXT, source code |
| OCR and parsing engine | Nemotron Parse (900M-parameter VLM) with failover |
| Structural elements classified | 13 categories including equations, tables, figures |
| Context length | Up to 1M tokens via Nemotron 3 architecture |
| Default LLM | OpenAI gpt-4o-2024-11-20 |
| Default embeddings | text-embedding-3-small |
| Vector database | Numpy (default), extensible to external backends |
| Metadata sources | Crossref, Semantic Scholar, Unpaywall, OpenAlex |
| Language support | 90+ languages via Nemotron models |
| API compatibility | LiteLLM (OpenAI, Claude, Gemini, Ollama, llama.cpp) |
| Minimum Python version | 3.11+ |
| GPU requirement (local) | NVIDIA GPU with 24 GB VRAM minimum |
| Deployment | Cloud endpoints, local self-hosted, API |
Nemotron 3 ships in three variants. Nano (3.2 billion active parameters) delivers 3.3x higher throughput than comparable models for cost-efficient inference. Super targets collaborative agents and high-volume workloads. Ultra prioritizes accuracy for complex multi-step reasoning tasks.
Use cases
Academic research acceleration
Researchers use PaperQA Nemotron to process large collections of scientific papers and synthesize answers to complex queries. The system handles questions like "Has anyone designed neural networks that compute with proteins or DNA?" by traversing citations across multiple sources and returning grounded answers with precise references. The ContraCrow application built on the same platform identifies contradictions across papers, a task that previously required manual expert review.
Autonomous AI scientist platforms
Edison Scientific's Kosmos deployment represents the most advanced production use case: an autonomous AI scientist that ingests research papers, decomposes them into indexed concepts, and grounds responses in specific passages. The platform serves universities, national laboratories, and healthcare organizations, processing theses and research papers at scale without human-in-the-loop extraction steps.
Literature review automation
The platform automates systematic literature reviews by processing hundreds of papers simultaneously, generating summaries with in-text citations, and identifying research gaps. Journal quality assessment and retraction checking help researchers evaluate source credibility before including papers in reviews.
Version history and recent changes
PaperQA switched from semantic versioning to calendar versioning (CalVer) in December 2025, with version 5 onward called PaperQA2 to reduce confusion between publication terminology and Git tags.
Version 5 (December 2025) introduced agentic workflows with dedicated tools for paper search, evidence gathering, and answer generation. Two model-based PDF readers were added: Docling and Nemotron Parse. All PDF readers now parse images and tables, report page numbers, and support variable DPI settings. Multimodal contextual summarization was added, passing media objects to the summary LLM during creation.
The February 2026 release (v2026.02.16) added automatic retry logic for 408 timeout errors, per-page error recovery for partial document processing, improved memory management during document reading, and parallel processing support for PyMuPDF full-page mode.
Open source components
Future House publishes the complete stack under open-source licenses. The GitHub repository contains the full PaperQA2 codebase. Nemotron Parse model weights are available on Hugging Face. NVIDIA releases Nemotron 3 Nano weights in FP8 and BF16 formats, training data from 2.5 trillion tokens of curated Common Crawl snapshots, pre-training and post-training software, and training recipes for model reproduction.
This open-source availability lets research institutions self-host the full stack, avoiding the data governance constraints that closed-source IDP platforms impose. Organizations with sensitive research data can run Nemotron Parse locally without sending documents to external APIs, provided they meet the 24 GB VRAM hardware requirement.
Market position
PaperQA Nemotron occupies a narrow but distinct segment of the IDP market. General-purpose platforms like ABBYY and Hyperscience optimize for finance, insurance, and healthcare document workflows with pre-built connectors and compliance certifications. PaperQA Nemotron instead optimizes for scientific document semantics: equations, citations, figures, and multi-paper synthesis.
The LABBench2 benchmark, released in February 2026, establishes a reproducible evaluation standard for scientific document AI that general IDP benchmarks do not cover. This creates a defensible evaluation moat for vendors targeting research automation, where FigQA2 and TableQA2 accuracy matter more than invoice processing throughput.
The agentic RAG architecture also signals a broader shift in document processing. Where traditional IDP pipelines extract structured fields from known document types, PaperQA Nemotron uses multi-step reasoning to answer open-ended questions across unstructured literature collections. That architectural difference makes direct comparison with horizontal IDP vendors less useful than comparison with research-specific tools.
Resources
- GitHub repository
- NVIDIA Nemotron 3 research
- Nemotron 3 white paper (PDF)
- PaperQA2 arXiv paper
- Future House WikiCrow announcement
- Future House engineering blog
- Hugging Face model collection
- Edison Scientific case study
- PyPI package documentation
Company information
Future House, San Francisco, CA. The organization publishes PaperQA2 as open-source software and conducts research into AI systems for scientific literature. No funding or employee count data is publicly available.