Mistral OCR 4 Adds Audit Trail For Enterprise Documents
Mistral AI released OCR 4 with bounding boxes, block classification and confidence scores, pricing the document model from $4 per 1,000 pages for enterprise workflows.

OCR 4 Adds Structure To Document Extraction
Mistral AI has released OCR 4, a document intelligence model built to return structured document representations rather than only extracted text.
The model identifies bounding boxes, classifies block types and assigns confidence scores at page and word level, giving enterprise teams more evidence to audit what a system has pulled from a document.
The release is aimed at companies that need document automation inside regulated workflows.
OCR 4 supports 170 languages across 10 language groups and accepts PDF, DOC, PPT and OpenDocument formats.
Mistral also says the model can run as a single container on an organization's own infrastructure, a deployment option for companies that do not want sensitive documents routed through U.S.-jurisdiction cloud APIs.
The model is available through the Mistral API, Document AI in Mistral Studio, Amazon SageMaker and Microsoft Foundry.
Snowflake Parse Document support is coming soon.
Pricing starts at $4 per 1,000 pages and falls to $2 per 1,000 pages through a batch API discount.
Layout Data Becomes The Enterprise Feature
The technical change is the layout layer.
OCR 4 returns localized blocks with labels such as title, table, equation or signature.
That means a paragraph can be used for semantic search, a table can move into a structured-data pipeline and a signature can trigger a redaction process.
Mistral said bounding boxes were its most-requested capability.
The reason is operational: compliance, legal and finance teams need to trace extracted facts back to a specific page location before they trust an AI workflow.
Without that location data, retrieval-augmented generation systems and agent workflows often need an extra layout-analysis step before the downstream model can use the document safely.
Confidence scores add another control point.
Organizations can route low-confidence regions to human reviewers while letting high-confidence extractions move through automated workflows.
That matters for scale because OCR is normally the first stage in a larger document pipeline, not the end product.
Benchmarks Still Need Production Proof
Mistral said human reviewers preferred OCR 4 over competing systems 72% of the time on average.
That comparison covered more than 600 real-world documents and more than 12 languages, with independent annotators judging the outputs.
The company also cited an 85.20 top overall score on OlmOCRBench and 93.07 on OmniDocBench.
Those figures support the launch, but enterprise buyers still need to test OCR 4 inside their own document sets.
Document quality, scanned images, tables, signatures, language mix and review rules can change whether a benchmark result becomes a production workflow.
The product also has to fit existing data-governance controls, because a model that reads contracts, invoices or identity documents can create audit and retention questions before it creates productivity gains.
The deployment list broadens that test.
Mistral is offering API access and studio tooling, while Amazon SageMaker and Microsoft Foundry give enterprises cloud procurement paths they may already use.
The single-container option is the stricter route for companies that want document processing closer to their own infrastructure.
OCR 4 gives Mistral a document-AI product with deployment options, audit data and clear pricing.
The unresolved enterprise issue is whether regulated customers can use those controls to reduce manual review without losing traceability when documents are complex or sensitive.
















