PaperQA Nemotron, Scientific Document AI Platform

On This Page

Overview
What users say
Production deployment and benchmarks
How PaperQA Nemotron processes documents
Technical architecture
Use cases
Academic research acceleration
Autonomous AI scientist platforms
Literature review automation
Version history and recent changes
Open source components
Market position
Resources
Company information

PaperQA Nemotron is an open-source intelligent document processing (IDP) platform combining Future House's PaperQA2 retrieval-augmented generation (RAG) engine with NVIDIA's Nemotron Parse vision-language model. It targets scientific research automation: processing PDFs, extracting equations, tables, and figures, and synthesizing answers across large literature collections with grounded citations.

50,000+Kosmos platform users served

34%FigQA2 accuracy vs 19% rule-based

30 secTime to parse a 300-page thesis

2.34Contradicted statements found per paper

Overview

PaperQA2 is engineered for high-accuracy RAG on scientific documents. Where general-purpose IDP platforms handle invoices, contracts, and forms, PaperQA Nemotron optimizes for academic literature: citation traversal, journal quality scoring, retraction checks, and multi-step reasoning across hundreds of papers simultaneously.

The platform's agentic architecture averages more than 4 tool calls per question and uses citation traversal on approximately 46% of queries, according to Future House's engineering analysis. That iterative retrieval pattern, with 1.26 searches per question for self-correction, distinguishes it from single-pass extraction pipelines used by most IDP vendors.

Edison Scientific deployed PaperQA Nemotron as the document processing backbone for Kosmos, an autonomous AI scientist platform now serving more than 50,000 users across universities, national laboratories, and healthcare organizations. James Braza, AI Research Technical Staff at Edison Scientific, stated in February 2026: "PDF extraction was a much larger bottleneck than initially realized. Switching to Nemotron Parse was a game changer, it unlocked the whole multimodal pipeline."

What users say

Practitioners deploying PaperQA Nemotron in research environments consistently highlight the citation grounding as the platform's strongest differentiator. Teams report that answers link directly to specific passages in source papers, reducing the hallucination risk that makes general-purpose LLMs unreliable for scientific work.

The agentic workflow draws mixed reactions. Researchers processing large literature collections value the citation traversal and self-correction behavior. Teams with simpler extraction needs find the 4+ tool-call overhead unnecessary for straightforward document parsing tasks, and note that the Python 3.11+ requirement and 24 GB VRAM minimum for local Nemotron Parse deployment create infrastructure barriers for smaller institutions.

Users comparing PaperQA Nemotron to general IDP platforms note that the platform's vertical focus on scientific documents means it lacks the pre-built connectors and workflow templates that horizontal vendors offer for finance and legal use cases. For research automation specifically, practitioners report it outperforms general-purpose alternatives on literature synthesis tasks.

Production deployment and benchmarks

Edison Scientific's Kosmos deployment establishes the clearest production baseline for PaperQA Nemotron at scale. The platform supports 100+ concurrent production runs with parallelized page processing. A 300-page thesis parses in approximately 30 seconds. During benchmark runs with 34 concurrent questions, the infrastructure processed 20 to 60 pages per second, with 99.9% of requests experiencing queue times under 2 seconds.

The LABBench2 benchmark, released in February 2026 to evaluate AI systems on practical biology research tasks, provides the most specific performance data available. Literature processed with Nemotron Parse achieved 34% accuracy on FigQA2 (figure understanding) compared to 19% for text-only rule-based parsers. Nemotron Parse improved figure understanding by 15%, table understanding by 7%, and text understanding by 3% versus rule-based extraction.

For financial document processing, Justt reported a 25% reduction in extraction error rate using Nemotron for chargeback analysis, demonstrating that the platform's accuracy gains extend beyond scientific documents to structured financial data.

Future House's WikiCrow research reports that PaperQA2 outperforms PhD and postdoc-level biology researchers on LitQA2 benchmarks. WikiCrow, a downstream application built on PaperQA2, produces Wikipedia-style summaries that blinded expert evaluators rated more accurate than actual Wikipedia articles. ContraCrow, another agent built on the platform, identified an average of 2.34 contradicted statements per paper across random biology papers.

By integrating the NVIDIA Nemotron Parse model into its PaperQA pipeline, Edison can decompose research papers, index key concepts and ground responses in specific passages, improving both throughput and answer quality for scientists. - NVIDIA Nemotron Labs blog, February 2026

How PaperQA Nemotron processes documents

The processing pipeline starts with document ingestion across PDFs, DOCX, TXT, and source code files. Nemotron Parse, a 900M-parameter vision-language model (VLM), handles the extraction layer. It classifies content across 13 structural document elements and produces markdown-formatted output with tables rendered in LaTeX, preserving the structure that standard PDF parsers destroy by merging columns and rows.

After extraction, the agentic RAG layer takes over. The system searches for relevant papers, gathers evidence passages, and generates answers with in-text citations. Citation traversal tools query both forward citations (papers that cite the source) and backward citations (papers the source cites) via Semantic Scholar and Crossref APIs. This bidirectional traversal increases paper recall and answer accuracy on complex research questions.

Metadata aggregation runs in parallel, pulling from four sources: Crossref, Semantic Scholar, Unpaywall, and OpenAlex. Citation counts, journal quality scores, and retraction status enrich the embedding space, so retrieval ranks papers by credibility, not just semantic similarity. LLM-based re-ranking with contextual summarization (RCS) then selects the most relevant passages before answer generation.

The platform supports all LiteLLM-compatible models, including OpenAI, Claude, Gemini, and locally hosted models via llama.cpp and Ollama. Default configuration uses OpenAI gpt-4o-2024-11-20 for generation and text-embedding-3-small for embeddings, with hybrid sparse/dense embedding support and local sentence-transformer models as alternatives.

Technical architecture

Component	Specification
Document ingestion	PDF, DOCX, TXT, source code
OCR and parsing engine	Nemotron Parse (900M-parameter VLM) with failover
Structural elements classified	13 categories including equations, tables, figures
Context length	Up to 1M tokens via Nemotron 3 architecture
Default LLM	OpenAI gpt-4o-2024-11-20
Default embeddings	text-embedding-3-small
Vector database	Numpy (default), extensible to external backends
Metadata sources	Crossref, Semantic Scholar, Unpaywall, OpenAlex
Language support	90+ languages via Nemotron models
API compatibility	LiteLLM (OpenAI, Claude, Gemini, Ollama, llama.cpp)
Minimum Python version	3.11+
GPU requirement (local)	NVIDIA GPU with 24 GB VRAM minimum
Deployment	Cloud endpoints, local self-hosted, API

Nemotron 3 ships in three variants. Nano (3.2 billion active parameters) delivers 3.3x higher throughput than comparable models for cost-efficient inference. Super targets collaborative agents and high-volume workloads. Ultra prioritizes accuracy for complex multi-step reasoning tasks.

Use cases

Academic research acceleration

Researchers use PaperQA Nemotron to process large collections of scientific papers and synthesize answers to complex queries. The system handles questions like "Has anyone designed neural networks that compute with proteins or DNA?" by traversing citations across multiple sources and returning grounded answers with precise references. The ContraCrow application built on the same platform identifies contradictions across papers, a task that previously required manual expert review.

Autonomous AI scientist platforms

Edison Scientific's Kosmos deployment represents the most advanced production use case: an autonomous AI scientist that ingests research papers, decomposes them into indexed concepts, and grounds responses in specific passages. The platform serves universities, national laboratories, and healthcare organizations, processing theses and research papers at scale without human-in-the-loop extraction steps.

Literature review automation

The platform automates systematic literature reviews by processing hundreds of papers simultaneously, generating summaries with in-text citations, and identifying research gaps. Journal quality assessment and retraction checking help researchers evaluate source credibility before including papers in reviews.

Version history and recent changes

PaperQA switched from semantic versioning to calendar versioning (CalVer) in December 2025, with version 5 onward called PaperQA2 to reduce confusion between publication terminology and Git tags.

Version 5 (December 2025) introduced agentic workflows with dedicated tools for paper search, evidence gathering, and answer generation. Two model-based PDF readers were added: Docling and Nemotron Parse. All PDF readers now parse images and tables, report page numbers, and support variable DPI settings. Multimodal contextual summarization was added, passing media objects to the summary LLM during creation.

The February 2026 release (v2026.02.16) added automatic retry logic for 408 timeout errors, per-page error recovery for partial document processing, improved memory management during document reading, and parallel processing support for PyMuPDF full-page mode.

Open source components

Future House publishes the complete stack under open-source licenses. The GitHub repository contains the full PaperQA2 codebase. Nemotron Parse model weights are available on Hugging Face. NVIDIA releases Nemotron 3 Nano weights in FP8 and BF16 formats, training data from 2.5 trillion tokens of curated Common Crawl snapshots, pre-training and post-training software, and training recipes for model reproduction.

This open-source availability lets research institutions self-host the full stack, avoiding the data governance constraints that closed-source IDP platforms impose. Organizations with sensitive research data can run Nemotron Parse locally without sending documents to external APIs, provided they meet the 24 GB VRAM hardware requirement.

Market position

PaperQA Nemotron occupies a narrow but distinct segment of the IDP market. General-purpose platforms like ABBYY and Hyperscience optimize for finance, insurance, and healthcare document workflows with pre-built connectors and compliance certifications. PaperQA Nemotron instead optimizes for scientific document semantics: equations, citations, figures, and multi-paper synthesis.

The LABBench2 benchmark, released in February 2026, establishes a reproducible evaluation standard for scientific document AI that general IDP benchmarks do not cover. This creates a defensible evaluation moat for vendors targeting research automation, where FigQA2 and TableQA2 accuracy matter more than invoice processing throughput.

The agentic RAG architecture also signals a broader shift in document processing. Where traditional IDP pipelines extract structured fields from known document types, PaperQA Nemotron uses multi-step reasoning to answer open-ended questions across unstructured literature collections. That architectural difference makes direct comparison with horizontal IDP vendors less useful than comparison with research-specific tools.

Resources

GitHub repository
NVIDIA Nemotron 3 research
Nemotron 3 white paper (PDF)
PaperQA2 arXiv paper
Future House WikiCrow announcement
Future House engineering blog
Hugging Face model collection
Edison Scientific case study
PyPI package documentation