paper-qa-nemotron
Open-source scientific document processing platform combining PaperQA2's RAG capabilities with NVIDIA Nemotron 3 models for superhuman performance in research tasks.
Overview
PaperQA2 represents the state-of-the-art in agentic RAG for scientific literature, engineered specifically for high-accuracy retrieval augmented generation on PDFs, text files, Microsoft Office documents, and source code. The platform integrates with NVIDIA's Nemotron 3 family of models to deliver what researchers claim is superhuman performance in scientific document processing tasks.
The integration leverages Nemotron 3's hybrid Mixture-of-Experts Mamba-Transformer architecture, which provides best-in-class throughput with context lengths up to 1M tokens. Edison Scientific has deployed this integration to power their Kosmos AI Scientist platform, demonstrating production-scale capabilities for research document automation.
Unlike traditional document processing platforms, paper-qa-nemotron targets the scientific research community with specialized capabilities for academic literature analysis, citation tracking, and research question answering with grounded responses.
Production Deployment and Enterprise Validation
Enterprise Implementation at Edison Scientific: Edison Scientific integrated NVIDIA's Nemotron Parse model into its PaperQA pipeline to create paper-qa-nemotron, enabling "rapid extraction of structured information from research PDFs including equations, tables, and figures" for their Kosmos AI Scientist platform.
Measurable Performance Improvements: NVIDIA reports that customer Justt "leveraged Nemotron, enabling a 25% reduction in extraction error rate to increase the reliability of financial chargeback analysis", demonstrating quantifiable enterprise impact beyond research applications.
Production-Ready Infrastructure: NVIDIA published a comprehensive developer tutorial providing production-ready code, specific model implementations (nvidia/llama-nemotron-embed-vl-1b-v2, nvidia/llama-nemotron-rerank-vl-1b-v2, nvidia/llama-3.3-nemotron-super-49b-v1.5), and technical specifications requiring NVIDIA GPU with at least 24 GB VRAM.
Key Features
- Agentic RAG Architecture: Language agents iteratively refine queries and answers for complex research questions
- Nemotron-Parse Integration: API-based document parsing with per-page failover and timeout handling
- Scientific Metadata Awareness: Automatic fetching of citation counts, journal quality data, and retraction checks
- LLM-Based Re-ranking: Contextual summarization (RCS) with document metadata integration
- Multi-Format Support: Processes PDFs, text files, Microsoft Office documents, and source code
- Full-Text Search Engine: Local repository search with Numpy vector database by default
- Template-Free Processing: No training required for new document types or domains
Technical Architecture
Nemotron 3 Integration
The platform utilizes three Nemotron 3 model variants: - Nano (3.2B active parameters): Cost-efficient inference with 3.3x higher throughput than comparable models - Super: Optimized for collaborative agents and high-volume workloads - Ultra: State-of-the-art accuracy for complex reasoning tasks
Processing Pipeline
| Component | Specification |
|---|---|
| Document Ingestion | PDF, DOCX, TXT, source code files |
| OCR Engine | Nemotron-parse API with failover to alternative parsers |
| Context Length | Up to 1M tokens via Nemotron 3 architecture |
| Embedding Models | OpenAI embeddings (default), customizable to other providers |
| Vector Database | Numpy (default), extensible to other backends |
| Language Support | 90+ languages through Nemotron models |
| API Framework | LiteLLM compatibility for model flexibility |
Addressing Document Processing Challenges
Structure Preservation: Paper-qa-nemotron addresses critical limitations where "standard PDF parsers merge columns and rows, destroying structure" and handles equations, tables, and figures that traditional parsing methods "often mishandle".
Accuracy Optimization: "Converting tables to markdown format significantly reduces numeric hallucinations caused by plain text linearization", addressing a key challenge in scientific document processing where precision is critical.
Scalable Processing: "The high efficiency of Nemotron Parse enables cost-efficient serving at scale, allowing Edison's team to unlock the whole multimodal pipeline" for production research workflows.
Use Cases
Academic Research Acceleration
Researchers use paper-qa-nemotron to process large collections of scientific papers, automatically extracting key findings and identifying contradictions across studies. The system handles complex queries like "Has anyone designed neural networks that compute with proteins or DNA?" by synthesizing evidence from multiple sources with precise citations.
Literature Review Automation
The platform automates systematic literature reviews by processing hundreds of papers simultaneously, generating summaries with in-text citations, and identifying research gaps. Citation tracking and journal quality assessment help researchers evaluate source credibility.
Scientific Question Answering
Beyond simple extraction, the agentic architecture enables multi-step reasoning for complex scientific questions, iteratively refining searches and synthesizing answers from diverse sources while maintaining citation accuracy.
Performance Metrics
Based on the 2024 research paper, PaperQA2 achieves superhuman performance across multiple scientific tasks:
- Question Answering: Exceeds human expert performance on domain-specific queries
- Summarization: Generates coherent summaries with accurate in-text citations
- Contradiction Detection: Identifies conflicting claims across research papers
- Processing Speed: Handles document collections at scale with automated metadata enrichment
Recent Developments
The February 2026 release (v2026.02.16) introduced significant improvements to nemotron-parse integration:
- Enhanced Error Handling: Automatic retry logic for 408 timeout errors with non-destructive failover
- Per-Page Processing: Granular error recovery allowing partial document processing
- Memory Optimization: Improved memory management during document reading
- Multiprocessing Support: Parallel processing for PyMuPDF full-page mode
Multi-Platform Availability
The solution is now available through multiple NVIDIA distribution channels including build.nvidia.com, GitHub, and NGC catalog, with both cloud endpoints and local deployment options supporting diverse enterprise requirements.
Open Source Ecosystem
Future House releases comprehensive open-source components:
- Model Weights: Nemotron 3 Nano models in FP8 and BF16 formats
- Training Data: 2.5 trillion tokens from curated Common Crawl snapshots
- Code Repositories: Complete pre-training and post-training software
- Training Recipes: Detailed implementation guides for model reproduction
Resources
- GitHub Repository
- NVIDIA Nemotron 3 Research
- Nemotron 3 White Paper
- PaperQA2 Research Paper
- Hugging Face Models
- NVIDIA Developer Tutorial
Company Information
Future House, San Francisco, CA