Nanonets OCR2 : Turning Documents into Structured, LLM-Ready Data
How to use Nanonets OCR2 for free?
Most document AI models are good at extraction, but bad at understanding. They’ll find text, maybe a table, but ask them where the watermark ends and the signature begins, and they’ll hallucinate faster than a sleep-deprived intern.
My new book, Audio AI for Beginners is out now !!
Nanonets OCR 2 fixes that.
This isn’t another “OCR but with AI” release. It’s a proper multimodal visual-language model trained to see documents like humans: context, structure, meaning, and noise included.
Overview
Nanonets-OCR2 is built for two main things: converting document images into structured Markdown and answering document-related questions accurately (VQA). It extends the older Nanonets-OCR-s model with a stronger backbone, better content segmentation, and less confusion between document artifacts like headers, footers, watermarks, and body text.
It can handle checkboxes, signatures, tables, flowcharts, and multilingual documents. It even outputs Mermaid code for visual diagrams, something you rarely see in OCR systems.
What sets it apart is its honesty: if the answer isn’t present in the document, the model says Not mentioned. No made-up answers, no filler text.You can test the family of models directly on Docstrange or Hugging Face:
- Nanonets-OCR2+ : the flagship model
- Nanonets-OCR2–3B : balanced between speed and accuracy
- Nanonets-OCR2–1.5B-exp : lightweight version for faster inference
Key Capabilities
Let’s skip the buzzwords and get into what it actually does.
- LaTeX Equation Recognition
Converts printed or handwritten math into LaTeX, differentiating between inline and display modes. Page numbers are wrapped in <page_number> tags, useful for structured post-processing.
2. Intelligent Image Description
If an image has a caption, it uses that. If not, it generates one. The descriptions go inside <img> tags, with enough context for LLMs to understand what’s visually there, graphs, logos, QR codes, etc.
3. Signature Detection & Isolation
Detects and separates signatures from surrounding text. If unreadable, it still marks <signature>signature</signature> so downstream systems know it’s signed, not skipped.
4. Watermark Extraction
Finds watermark text, even on noisy scans, and wraps it in <watermark> tags. Surprisingly robust on low-quality inputs.
5. Checkbox & Radio Handling
Translates checkboxes into Unicode symbols inside <checkbox> tags. That’s a big deal for form automation and compliance-heavy workflows.
6. Complex Table Extraction
Pulls out nested or multi-row tables into Markdown and HTML. Better than most open models, which tend to flatten cell hierarchies.
7. Flowchart and Org Chart Parsing
Outputs Mermaid code, which means you can regenerate the original chart visually. A rare but useful feature.
8. Multilingual Support
Trained on over a dozen languages: English, Chinese, French, Spanish, Japanese, Arabic, and more.
9. Visual Question Answering (VQA)
When queried, it either gives a direct answer or says Not mentioned. No guessing, no hallucination padding.
Comparison with dots.ocr
Against dots.ocr, Nanonets OCR 2 shows visible improvements across the board:
- Checkboxes: cleaner tag recognition, fewer false positives
- Flowcharts: generates syntactically valid Mermaid code
- Signatures: correctly separates pen marks from text noise
- Tables: maintains cell structure better in Markdown export
- Watermarks: consistent detection even in grayscale or faded documents
The difference is clear in outputs: dots.ocr treats visuals as background noise; Nanonets-OCR2 treats them as first-class information.
Image-to-Markdown Evaluations
They used Gemini-2.5-Pro as the evaluator. Existing OCR benchmarks like olmOCRBench or OmniDocBench don’t measure markdown quality well, so Nanonets built their own evaluation set.
Results:
OCR2+ beats all previous internal models, and even GPT-5 (low-thinking mode), in markdown correctness.
OCR2+ had a win rate of ~34% vs Gemini Flash and 29% vs OCR 2 3B, showing that model size doesn’t directly correlate to markdown accuracy.
The team plans to open-source evaluation scripts and predictions on GitHub, which is good, OCR benchmarks need more transparency.
VQA Evaluations
For document-level question answering, they used the IDP Leaderboard datasets (ChartQA, DocVQA).
- ChartQA: OCR2+ scores 79.2 vs 76.2 on Qwen-72B.
- DocVQA: OCR2+ hits 85.15, slightly below Gemini-Flash’s 85.5 but higher than most mid-scale open models.
Given its smaller parameter count, that’s impressive.
Training Details
The model was trained on 3 million+ pages, covering everything from invoices and research papers to healthcare forms and tax receipts. Both synthetic and manually annotated data were used.
The pipeline:
- Pretrain on synthetic docs for broad generalization.
- Fine-tune on annotated data for realism.
Base model: Qwen2.5-VL-3B, fine-tuned for document-specific vision-language tasks.
Limitations:
- Struggles with very complex flowcharts or diagrams.
- Occasional hallucinations in dense tabular layouts.
Still, it’s one of the more balanced trade-offs between performance and interpretability in the current open OCR landscape.
Use Cases
Nanonets-OCR2 basically acts as a bridge between scanned mess and structured reasoning.
- Academia & Research: Converts PDFs with formulas and tables into Markdown + LaTeX.
- Legal & Finance: Detects signatures, extracts checkboxes, standardizes formatting.
- Healthcare: Reads medical forms and checkboxes accurately.
- Enterprise Knowledge Bases: Turns multi-page reports into LLM-digestible text blocks
If your workflow involves feeding documents into an LLM, this model saves you hours of cleanup.
Open Source & Access
You can try Nanonets-OCR2 directly on Docstrange, or grab it from Hugging Face. The team encourages open discussions on both platforms, rare for enterprise OCRs.
nanonets/Nanonets-OCR2-3B · Hugging Face
In a world where unstructured data still blocks automation, models like Nanonets-OCR2 make the connection simple: raw pixels in, structured reasoning out.
Docstrange – AI Document Data Extraction by Nanonets
Nanonets OCR2 : Turning Documents into Structured, LLM-Ready Data was originally published in Data Science in Your Pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.