On This Page

tagtog is an annotation vendor specializing in collaborative text annotation with OCR integration for NLP training datasets.

tagtog

8.3/10Gitnux overall score (March 2026)
#5Ranked among 10 text annotation tools
205.74sMean time to annotate 100 documents (document-level)
€49Max per-user monthly Pro tier price

Overview

Tagtog Sp. z o.o. is a Polish annotation specialist operating under Primer AI as its parent company. It positions itself as an annotation layer that integrates with multiple OCR providers rather than competing in the OCR space directly. Its vendor-agnostic OCR approach spans Google Cloud Vision, AWS Textract, Tesseract, ABBYY FineReader, and Kadmos, letting organizations keep existing OCR investments while adding structured annotation workflows on top.

Gitnux ranked tagtog #5 among 10 text annotation tools in March 2026, scoring it 8.3/10 overall with 9.2/10 for features and 7.6/10 for ease of use. Wifitalents.com independently scored it 8.1/10 across features, ease of use, and value in a separate 10-tool comparison the same year. The platform has established particular credibility in biomedical domains, where it has been used to annotate gene mentions and specialized entities in research publications, and it provides native PubMed integration for annotating abstracts and titles directly.

In July 2025, tagtog was featured among 33 key players alongside Amazon Mechanical Turk, Google, and Labelbox in a data annotation tools market valued at $1.9 billion in 2024.

What Users Say

Practitioners consistently cite tagtog's feature depth as its primary strength. Label Your Data CEO Karyna positions it as "best for complex NLP projects and legal document annotation tools requiring overlapping spans or nested entities" and for "teams working primarily with PDFs needing native document context" (2026). The 9.2/10 feature score from Gitnux reflects this: the platform supports overlapping spans, nested entities, typed relations, and document-level classification in a single environment, which few competitors match at this price point.

The friction shows up at setup and scale. Gitnux notes a steep learning curve for advanced schema configuration and a UI that feels dated compared to newer tools. Labellerr.com describes the interface as "slightly confusing" initially, though practitioners report it becomes workable with use. The free tier's 5,000 annotation/month cap on private data pushes teams doing large-scale labeling toward paid plans quickly, and the absence of local installation in the free version creates friction for organizations in regulated industries that cannot use cloud-hosted infrastructure.

How tagtog Annotation Tool processes documents

tagtog's processing model separates OCR from annotation: scanned documents are first converted to text by a connected OCR provider, then passed into tagtog's annotation environment where human reviewers and machine learning models work in parallel. This split architecture means the platform's accuracy ceiling is set by the chosen OCR engine, while tagtog controls the annotation quality layer above it.

Between 2019 and 2022, tagtog built out the enterprise side of that annotation layer in three meaningful steps: native PDF annotation with coordinate-based positioning, zoom/pan, and text search; Teams Management with floating seat licensing for on-premises enterprise deployments; and Automatic Adjudication, which resolves disagreements between annotators using configurable union, intersection, or majority-vote strategies. Together these additions shifted the platform from a research tool toward production annotation workflows requiring audit trails and multi-reviewer governance.

Machine learning assistance in the form of named entity recognition (NER), activated by default for Cloud TEAM PRO subscriptions, reduces annotator workload by suggesting entities based on prior annotations. Inter-annotator agreement metrics surface where human reviewers diverge before adjudication runs. The platform also supports active learning loops, where the model continuously retrains on confirmed annotations and surfaces uncertain examples for human review first.

Speed trade-off: flexibility versus throughput

The most concrete limitation in tagtog's public record comes from peer-reviewed benchmarking published in BMC Bioinformatics. Researchers measured that tagtog requires 400 actions and 205.74 seconds (mean) to annotate 100 documents at document level. MedTAG completed the same task in 46.84 seconds; MyMiner required 56.68 seconds. For mention identification, tagtog needed 304.69 seconds versus MyMiner's 114.39 seconds and MedTAG's 159.34 seconds.

The speed gap traces directly to a design choice: tagtog's dropdown-based label selection requires annotators to specify whether each label is "true," "false," or "unknown," adding interaction steps that checkbox-based tools skip. This trade-off favors complex biomedical and legal annotation projects where label nuance matters more than throughput. For high-volume, simple classification tasks, the overhead is a real cost. The benchmark data dates to 2021 and predates tagtog's active learning and machine-assisted labeling improvements, so the current gap may be narrower for teams using automation.

Use cases

Biomedical literature mining

Research institutions use tagtog to annotate genes, proteins, and diseases in scientific publications. Native PubMed integration lets teams pull abstracts directly into the annotation environment. Machine learning assistance suggests similar entities based on previous annotations, reducing per-document effort for high-volume literature curation programs. The platform's customizable label states, including true/false/unknown distinctions, support the nuanced annotation schemas that biomedical NLP requires.

Enterprise document processing

Organizations processing scanned documents route images through a connected OCR provider such as AWS Textract, ABBYY FineReader, or Tesseract, then pass the extracted text into tagtog for structured annotation workflows. Floating seat licensing and on-premises deployment via Docker image support regulated environments where data cannot leave the organization's infrastructure. Upload limits reach 250MB for on-premises deployments versus 10MB on cloud.

NLP training data preparation

Teams building NLP models use tagtog to create labeled datasets from unstructured text. Supported annotation types include overlapping spans, nested entities, entity attributes, typed relations, and document-level classification, covering named entity recognition, entity linking, relation extraction, and document categorization. Export formats include JSON, CoNLL, and Brat, covering the most common downstream training pipeline requirements. Automatic adjudication resolves annotation conflicts through configurable voting strategies, producing clean ground-truth data without manual reconciliation rounds.

Technical specifications

Feature Specification
Deployment options Cloud-hosted SaaS, on-premises via Docker image
Document formats Native PDF, plain text, URLs, HTML, CSV, TSV, Markdown, source code files
OCR integration Google Cloud Vision, AWS Textract, Tesseract, ABBYY FineReader, Kadmos
Upload capacity 10MB cloud; 250MB on-premises
Security TLS 1.3, A+ SSL Labs rating
Licensing Floating seats for enterprise
Export formats JSON, CoNLL, Brat
Free tier Unlimited for public data; 5,000 annotations/month cap on private data; no local installation
Paid tiers €19/user/month (Basic) to €49/user/month (Pro); Enterprise custom pricing
ML assistance NER active by default on Cloud TEAM PRO; active learning loops
Adjudication strategies Union, intersection, majority vote
Biomedical integration Native PubMed import

Competitive context

tagtog occupies the mid-to-premium segment of collaborative annotation tools, competing against Label Studio, Prodigy, Kili, and SuperAnnotate for NLP teams building training datasets. Its acquisition by Primer AI signals consolidation in the document AI space, where annotation tooling increasingly integrates with broader document understanding platforms rather than remaining standalone.

The platform's strongest differentiator is annotation schema flexibility: overlapping spans, nested entities, and typed relations in a native PDF environment are not universally available at this price point. Its weakest point relative to competitors is throughput on simple tasks, where the dropdown interaction model adds measurable overhead. Teams choosing between tagtog and open-source alternatives like Doccano or BRAT are typically trading feature depth against deployment simplicity and cost. Teams choosing between tagtog and enterprise platforms like Kili or SuperAnnotate are trading annotation UI maturity against broader MLOps integration.

Industry-wide, LLM-assisted annotation workflows reduce annotation time 40-70% when humans verify low-confidence predictions. tagtog's active learning positioning aligns with this trend, but the depth of its LLM integration relative to competitors is not independently benchmarked as of early 2026.

Resources

  • tagtog Documentation
  • OCR Integration Tutorial
  • Platform Updates 2019-2022
  • Biomedical Annotation Research (PubMed Central)
  • BMC Bioinformatics Benchmark Study

Company information

Company: Tagtog Sp. z o.o. Parent: Primer AI Location: Warsaw, Poland Website: https://www.tagtog.com