Yapit — Open-Source Text-to-Speech for Documents
On This Page
Yapit is an open-source text-to-speech platform built specifically for reading documents aloud. It handles PDFs, academic papers, and web pages with intelligent processing of math, citations, and figures that generic TTS tools fail on.
Overview
Yapit is a self-hostable TTS application that accepts a URL or PDF and reads the content aloud. Unlike general TTS tools, it is designed for document fidelity: math equations are spoken as alt text rather than raw LaTeX, citation markers and figure labels are naturalized, and page headers and footers are stripped before synthesis.
The project is licensed under AGPL-3.0 and available on GitHub. It supports 170+ voices across 15 languages and can run the TTS engine entirely in the browser via Kokoro-82M on WebGPU. No server is needed for basic usage.
Yapit differs from commercial IDP vendors in scope. Where platforms like ABBYY or Nanonets focus on structured data extraction and workflow automation, Yapit focuses on accessibility and document consumption through audio output. It is relevant to organizations researching open-source document intelligence tooling or building accessible document workflows.
Document Processing Capabilities
Yapit's document pipeline covers:
- PDF ingestion: Layout analysis via DocLayout-YOLO detects figures, tables, and headers for accurate extraction
- Web page extraction: Clean content extraction via the defuddle library strips navigation, ads, and boilerplate
- Academic paper handling: Math rendered visually but spoken as alt text; citation markers and figure labels converted to natural speech patterns
- Markdown export: Append
/mdto any document URL to retrieve clean markdown;/md-annotatedadds TTS annotations - Customizable extraction: Extraction behavior driven by a configurable prompt, supporting any OpenAI-compatible vision API (OpenRouter, vLLM, Ollama, Google Gemini)
TTS Engine Options
| Engine | Mode | Notes |
|---|---|---|
| Kokoro-82M | Browser (WebGPU/CPU) | Runs locally, no server required |
| Kokoro-FastAPI | Self-hosted server | Docker worker, GPU/CPU |
| Inworld TTS | Hosted | Cloud API |
| OpenAI-compatible | Any | vLLM-Omni, AllTalk, Chatterbox TTS |
Voice auto-discovery is supported when the server exposes GET /v1/audio/voices.
Deployment
Yapit is designed for self-hosting via Docker Compose:
make self-host
Default mode is single-user with no login required. Multi-user mode with authentication (Stack Auth + ClickHouse) is available via AUTH_ENABLED=true. GPU workers for Kokoro TTS and YOLO figure detection are optional add-ons, with NVIDIA MPS support for multi-worker GPU sharing.
The tech stack uses a Python backend (managed via uv), a Node.js/Vite frontend, and Redis for job queuing.
Features
- 170+ voices across 15 languages
- Document outliner and Vim-style keyboard shortcuts
- Media key support and adjustable playback speed
- Dark mode and share-by-link
- MP3 audio export on roadmap
Technical Specifications
| Feature | Specification |
|---|---|
| License | AGPL-3.0 |
| Backend | Python (uv) |
| Frontend | Node.js / Vite |
| TTS Model | Kokoro-82M (WebGPU / CPU / GPU worker) |
| PDF Layout | DocLayout-YOLO |
| Web Extraction | defuddle |
| Auth | Stack Auth (optional) |
| Analytics | ClickHouse (optional, multi-user mode) |
| Deployment | Docker Compose |
Company Information
Yapit is an open-source project available at github.com/yapit-tts/yapit. No commercial entity or funding information is publicly disclosed. The project website is yapit.md.
For commercial IDP platforms with enterprise support, see open-source IDP vendors or compare options in the vendor finder.