On This Page

Yapit is an open-source text-to-speech platform built specifically for reading documents aloud. It handles PDFs, academic papers, and web pages with intelligent processing of math, citations, and figures that generic TTS tools fail on.

Overview

Yapit is a self-hostable TTS application that accepts a URL or PDF and reads the content aloud. Unlike general TTS tools, it is designed for document fidelity: math equations are spoken as alt text rather than raw LaTeX, citation markers and figure labels are naturalized, and page headers and footers are stripped before synthesis.

The project is licensed under AGPL-3.0 and available on GitHub. It supports 170+ voices across 15 languages and can run the TTS engine entirely in the browser via Kokoro-82M on WebGPU. No server is needed for basic usage.

Yapit differs from commercial IDP vendors in scope. Where platforms like ABBYY or Nanonets focus on structured data extraction and workflow automation, Yapit focuses on accessibility and document consumption through audio output. It is relevant to organizations researching open-source document intelligence tooling or building accessible document workflows.

Document Processing Capabilities

Yapit's document pipeline covers:

  • PDF ingestion: Layout analysis via DocLayout-YOLO detects figures, tables, and headers for accurate extraction
  • Web page extraction: Clean content extraction via the defuddle library strips navigation, ads, and boilerplate
  • Academic paper handling: Math rendered visually but spoken as alt text; citation markers and figure labels converted to natural speech patterns
  • Markdown export: Append /md to any document URL to retrieve clean markdown; /md-annotated adds TTS annotations
  • Customizable extraction: Extraction behavior driven by a configurable prompt, supporting any OpenAI-compatible vision API (OpenRouter, vLLM, Ollama, Google Gemini)

TTS Engine Options

Engine Mode Notes
Kokoro-82M Browser (WebGPU/CPU) Runs locally, no server required
Kokoro-FastAPI Self-hosted server Docker worker, GPU/CPU
Inworld TTS Hosted Cloud API
OpenAI-compatible Any vLLM-Omni, AllTalk, Chatterbox TTS

Voice auto-discovery is supported when the server exposes GET /v1/audio/voices.

Deployment

Yapit is designed for self-hosting via Docker Compose:

make self-host

Default mode is single-user with no login required. Multi-user mode with authentication (Stack Auth + ClickHouse) is available via AUTH_ENABLED=true. GPU workers for Kokoro TTS and YOLO figure detection are optional add-ons, with NVIDIA MPS support for multi-worker GPU sharing.

The tech stack uses a Python backend (managed via uv), a Node.js/Vite frontend, and Redis for job queuing.

Features

  • 170+ voices across 15 languages
  • Document outliner and Vim-style keyboard shortcuts
  • Media key support and adjustable playback speed
  • Dark mode and share-by-link
  • MP3 audio export on roadmap

Technical Specifications

Feature Specification
License AGPL-3.0
Backend Python (uv)
Frontend Node.js / Vite
TTS Model Kokoro-82M (WebGPU / CPU / GPU worker)
PDF Layout DocLayout-YOLO
Web Extraction defuddle
Auth Stack Auth (optional)
Analytics ClickHouse (optional, multi-user mode)
Deployment Docker Compose

Company Information

Yapit is an open-source project available at github.com/yapit-tts/yapit. No commercial entity or funding information is publicly disclosed. The project website is yapit.md.

For commercial IDP platforms with enterprise support, see open-source IDP vendors or compare options in the vendor finder.