Dataiku: IDP Software Vendor

On This Page

Overview
How Dataiku Processes Documents
Use Cases
Enterprise AI Governance
Retail AI Transformation
Document Intelligence Workflows
Technical Specifications
Resources
Company Information

Universal AI platform with document intelligence capabilities, preparing for 2026 IPO with $350M ARR and governance-by-design approach.

Dataiku

Overview

Founded in 2013 in Paris, Dataiku provides an enterprise AI platform that includes intelligent document processing through its Universal AI Platform. The company is preparing for a U.S. IPO in H1 2026 with Morgan Stanley and Citigroup as lead underwriters at a $3.7 billion valuation - up from a $4.6 billion Series E valuation in 2021, reflecting recalibrated market conditions. Dataiku surpassed $350M ARR in October 2025, up from $300M+ ARR in January 2025.

The company was recognized as a Leader in IDC's MarketScape for Worldwide Unified AI Governance Platforms 2026 - its first major governance-specific analyst recognition - with CEO Florian Douetteau framing the shift: "AI governance has shifted from a checkpoint to a foundation." By February 2026, that positioning had accumulated three external validations in a single month: a #33 ranking on G2's Best Analytics Software Products list based on verified user reviews, a Most Innovative Agent Development Platform award from SiliconANGLE Media, and the launch of the 575 Lab open-source office. The G2 ranking carries customer-validated weight - drawn from verified reviews across a marketplace reaching all Fortune 500 companies - while the SiliconANGLE award is analyst-judged. Neither is an independent analyst report.

Former Salesforce President Alexandre Dayon joined the Board of Directors in January 2026, strengthening enterprise sales expertise ahead of the IPO.

How Dataiku Processes Documents

Dataiku's document processing runs through the Natif.ai IDP plugin, a modular pipeline that handles PDF, TIFF, and JPEG inputs using computer vision, deep learning, and NLP. The pipeline converts native and scanned content into structured data, with vision-language models (VLMs) extracting information from text, tables, and images in a single pass. Governance controls - audit trails, end-to-end traceability, and compliance checkpoints - are embedded directly in the workflow rather than applied as post-processing overlays.

The Agent Hub extends this into multi-step agentic workflows: a collaborative workspace where AI agents can be built, shared, and scaled with ROI measurement attached. The AI Factory Accelerator, powered by NVIDIA, accelerates enterprise-scale deployments with native governance integration baked in.

In February 2026, Dataiku moved its governance infrastructure into open source through the 575 Lab, its dedicated Open Source Office. Two toolkits are generally available: Agent Explainability Tools, which traces decision-making across multi-step agentic workflows and surfaces agent reasoning for data scientists, compliance teams, and end users; and Privacy-Preserving Proxies, which protects sensitive data end-to-end when enterprises use closed-source models, designed for local deployment. Licensing terms and GitHub repository URLs were not disclosed in available sources. Dataiku simultaneously joined the Linux Foundation and the newly formed Agentic AI Foundation - a standards play, not a product feature. As Douetteau put it: "Enterprises need reusable building blocks that can become the standards for how agentic systems are controlled and inspected."

Use Cases

Enterprise AI Governance

Organizations leverage Dataiku's unified governance platform to close the gap where 95% of data leaders can't fully trace AI decisions end-to-end while 86% report AI embedded in daily operations. The platform embeds governance controls - traceability, audit logs, compliance checkpoints - directly into AI development workflows rather than as afterthought controls. The 575 Lab's Agent Explainability Tools extend this to agentic pipelines, making multi-step agent reasoning inspectable by compliance teams without requiring custom instrumentation.

Retail AI Transformation

Retailers use Dataiku's Retail Accelerator Pack for customer experience optimization and back-office automation. The pack includes seven ready-to-use use cases covering entity extraction and LLM-enhanced predictions. Head of AI Architecture Jed Dougherty notes the tension: "The riskiest place to use GenAI in retail is also the most valuable one: the customer experience." The accelerator is designed to compress deployment timelines for teams that cannot build from scratch.

Document Intelligence Workflows

Teams process document collections through the modular Natif.ai pipeline, converting native and scanned content to structured data with embedded governance controls for regulatory compliance and audit trails. The AWS Agentic AI and Healthcare Software Competency certifications extend this into healthcare-specific document workflows on AWS infrastructure, where compliance requirements are most stringent.

Technical Specifications

Feature	Specification
Core Platform	Dataiku Data Science Studio (DSS), Universal AI Platform
Document Processing	Natif.ai IDP plugin, modular pipeline
AI Governance	Native governance controls, end-to-end traceability, 575 Lab open-source toolkits
Agent Platform	Agent Hub with collaborative workspace and ROI measurement
Open Source	575 Lab: Agent Explainability Tools, Privacy-Preserving Proxies (GA; licensing terms not disclosed)
Cloud Partnerships	AWS Agentic AI and Healthcare Software Competency, NVIDIA AI Factory Accelerator
File Formats	PDF, TIFF, JPEG
Deployment	Cloud, on-premises
Accelerators	Retail (7 use cases), healthcare, manufacturing
Industry Foundations	Linux Foundation member, Agentic AI Foundation member

Category	Winner
Global Data Partner of the Year	Snowflake
Global Cloud Partner of the Year	AWS
Global Systems Integrator of the Year	Accenture
Global Reseller Partner of the Year	K.K. Ashisuto
Americas SI of the Year	Aimpoint Digital
EMEA SI of the Year	Eulidia
APJ SI of the Year	ST Engineering (Mission Software & Services)
Americas Innovator of the Year	v4c.ai
EMEA Innovator of the Year	Infomotion
APJ Innovator of the Year	Datasolution