paper-qa-nemotron

Open-source scientific document processing platform combining PaperQA2's RAG capabilities with NVIDIA Nemotron 3 models for superhuman performance in research tasks.

Overview

PaperQA2 represents the state-of-the-art in agentic RAG for scientific literature, engineered specifically for high-accuracy retrieval augmented generation on PDFs, text files, Microsoft Office documents, and source code. The platform integrates with NVIDIA's Nemotron 3 family of models to deliver what researchers claim is superhuman performance in scientific document processing tasks.

The integration leverages Nemotron 3's hybrid Mixture-of-Experts Mamba-Transformer architecture, which provides best-in-class throughput with context lengths up to 1M tokens. Edison Scientific has deployed this integration to power their Kosmos AI Scientist platform, demonstrating production-scale capabilities for research document automation.

Unlike traditional document processing platforms, paper-qa-nemotron targets the scientific research community with specialized capabilities for academic literature analysis, citation tracking, and research question answering with grounded responses.

Production Deployment and Enterprise Validation

Enterprise Implementation at Edison Scientific: Edison Scientific integrated NVIDIA's Nemotron Parse model into its PaperQA pipeline to create paper-qa-nemotron, enabling "rapid extraction of structured information from research PDFs including equations, tables, and figures" for their Kosmos AI Scientist platform.

Measurable Performance Improvements: NVIDIA reports that customer Justt "leveraged Nemotron, enabling a 25% reduction in extraction error rate to increase the reliability of financial chargeback analysis", demonstrating quantifiable enterprise impact beyond research applications.

Production-Ready Infrastructure: NVIDIA published a comprehensive developer tutorial providing production-ready code, specific model implementations (nvidia/llama-nemotron-embed-vl-1b-v2, nvidia/llama-nemotron-rerank-vl-1b-v2, nvidia/llama-3.3-nemotron-super-49b-v1.5), and technical specifications requiring NVIDIA GPU with at least 24 GB VRAM.

Key Features

Agentic RAG Architecture: Language agents iteratively refine queries and answers for complex research questions
Nemotron-Parse Integration: API-based document parsing with per-page failover and timeout handling
Scientific Metadata Awareness: Automatic fetching of citation counts, journal quality data, and retraction checks
LLM-Based Re-ranking: Contextual summarization (RCS) with document metadata integration
Multi-Format Support: Processes PDFs, text files, Microsoft Office documents, and source code
Full-Text Search Engine: Local repository search with Numpy vector database by default
Template-Free Processing: No training required for new document types or domains

Technical Architecture

Nemotron 3 Integration

The platform utilizes three Nemotron 3 model variants: - Nano (3.2B active parameters): Cost-efficient inference with 3.3x higher throughput than comparable models - Super: Optimized for collaborative agents and high-volume workloads - Ultra: State-of-the-art accuracy for complex reasoning tasks

Processing Pipeline

Component	Specification
Document Ingestion	PDF, DOCX, TXT, source code files
OCR Engine	Nemotron-parse API with failover to alternative parsers
Context Length	Up to 1M tokens via Nemotron 3 architecture
Embedding Models	OpenAI embeddings (default), customizable to other providers
Vector Database	Numpy (default), extensible to other backends
Language Support	90+ languages through Nemotron models
API Framework	LiteLLM compatibility for model flexibility

Addressing Document Processing Challenges

Structure Preservation: Paper-qa-nemotron addresses critical limitations where "standard PDF parsers merge columns and rows, destroying structure" and handles equations, tables, and figures that traditional parsing methods "often mishandle".

Accuracy Optimization: "Converting tables to markdown format significantly reduces numeric hallucinations caused by plain text linearization", addressing a key challenge in scientific document processing where precision is critical.

Scalable Processing: "The high efficiency of Nemotron Parse enables cost-efficient serving at scale, allowing Edison's team to unlock the whole multimodal pipeline" for production research workflows.

Use Cases

Academic Research Acceleration

Researchers use paper-qa-nemotron to process large collections of scientific papers, automatically extracting key findings and identifying contradictions across studies. The system handles complex queries like "Has anyone designed neural networks that compute with proteins or DNA?" by synthesizing evidence from multiple sources with precise citations.

Literature Review Automation

The platform automates systematic literature reviews by processing hundreds of papers simultaneously, generating summaries with in-text citations, and identifying research gaps. Citation tracking and journal quality assessment help researchers evaluate source credibility.

Scientific Question Answering

Beyond simple extraction, the agentic architecture enables multi-step reasoning for complex scientific questions, iteratively refining searches and synthesizing answers from diverse sources while maintaining citation accuracy.

Performance Metrics

Based on the 2024 research paper, PaperQA2 achieves superhuman performance across multiple scientific tasks:

Question Answering: Exceeds human expert performance on domain-specific queries
Summarization: Generates coherent summaries with accurate in-text citations
Contradiction Detection: Identifies conflicting claims across research papers
Processing Speed: Handles document collections at scale with automated metadata enrichment

Recent Developments

The February 2026 release (v2026.02.16) introduced significant improvements to nemotron-parse integration:

Enhanced Error Handling: Automatic retry logic for 408 timeout errors with non-destructive failover
Per-Page Processing: Granular error recovery allowing partial document processing
Memory Optimization: Improved memory management during document reading
Multiprocessing Support: Parallel processing for PyMuPDF full-page mode

Multi-Platform Availability

The solution is now available through multiple NVIDIA distribution channels including build.nvidia.com, GitHub, and NGC catalog, with both cloud endpoints and local deployment options supporting diverse enterprise requirements.

Open Source Ecosystem

Future House releases comprehensive open-source components:

Model Weights: Nemotron 3 Nano models in FP8 and BF16 formats
Training Data: 2.5 trillion tokens from curated Common Crawl snapshots
Code Repositories: Complete pre-training and post-training software
Training Recipes: Detailed implementation guides for model reproduction