tagtog Annotation Tool

tagtog specializes in collaborative text annotation with OCR integration for NLP training datasets.

tagtog

Overview

Tagtog Sp. z o.o. is a Polish annotation specialist, now operating under Primer AI as its parent company, that positions itself as an annotation layer integrating with multiple OCR providers rather than competing in the OCR space directly. Its vendor-agnostic OCR approach — spanning Google Cloud Vision, AWS Textract, Tesseract, ABBYY FineReader, and Kadmos — lets organizations keep existing OCR investments while adding structured annotation workflows on top.

In July 2025, tagtog was featured among 33 key players alongside Amazon Mechanical Turk, Google, and Labelbox in a data annotation tools market projected to grow from $1.9 billion in 2024 to $6.2 billion by 2030. The platform has established particular credibility in biomedical domains, where it has been used to annotate gene mentions and specialized entities in research publications.

How tagtog Annotation Tool Processes Documents

tagtog's processing model separates OCR from annotation: scanned documents are first converted to text by a connected OCR provider, then passed into tagtog's annotation environment where human reviewers and machine learning models work in parallel. This split architecture means the platform's accuracy ceiling is set by the chosen OCR engine, while tagtog controls the annotation quality layer above it.

Between 2019 and 2022, tagtog built out the enterprise side of that annotation layer in three meaningful steps: Native PDF annotation with coordinate-based positioning, zoom/pan, and text search; Teams Management with floating seat licensing for on-premises enterprise deployments; and Automatic Adjudication, which resolves disagreements between annotators using configurable union, intersection, or majority-vote strategies. Together these additions shifted the platform from a research tool toward production annotation workflows requiring audit trails and multi-reviewer governance.

Machine learning assistance — NER activated by default for Cloud TEAM PRO subscriptions — reduces annotator workload by suggesting entities based on prior annotations, while inter-annotator agreement metrics surface where human reviewers diverge before adjudication runs.

Use Cases

Biomedical Literature Mining

Research institutions use tagtog to annotate genes, proteins, and diseases in scientific publications. The platform's machine learning assistance suggests similar entities based on previous annotations, reducing the per-document effort for high-volume literature curation programs.

Enterprise Document Processing

Organizations processing scanned documents route images through a connected OCR provider — AWS Textract, ABBYY FineReader, or Tesseract — then pass the extracted text into tagtog for structured annotation workflows. Floating seat licensing and on-premises deployment support regulated environments where data cannot leave the organization's infrastructure.

NLP Training Data Preparation

Teams building NLP models use tagtog to create labeled datasets from unstructured text. Automatic adjudication resolves annotation conflicts through configurable voting strategies, producing clean ground-truth data without manual reconciliation rounds.

Technical Specifications

Feature	Specification
Deployment Options	Cloud-hosted SaaS, On-premises installation
Document Formats	Native PDF, CSV/TSV, Markdown, source code files
OCR Integration	Google Cloud Vision, AWS Textract, Tesseract, ABBYY FineReader, Kadmos
Upload Capacity	10MB Cloud, 250MB OnPremises
Security	TLS 1.3 support, A+ SSL Labs rating
Licensing	Floating seats for enterprise
Free Tier	Unlimited for public data; 5,000 annotations/month cap on private data
Paid Tier	$0.03 per annotation beyond free tier
ML Assistance	NER active by default on Cloud TEAM PRO
Adjudication Strategies	Union, intersection, majority vote