We're excited to share Nanonets-OCR2, a state-of-the-art suite of models designed for advanced image-to-markdown conversion and Visual Question Answering (VQA).
đ Key Features:
- LaTeX Equation Recognition:Â Automatically converts mathematical equations and formulas into properly formatted LaTeX syntax. It distinguishes between inline (
$...$) and display ($$...$$) equations. - Intelligent Image escription:Â escribes images within documents using structuredÂ
<img>Â tags, making them digestible for LLM processing. It can describe various image types, including logos, charts, graphs and so on, detailing their content, style, and context. - Signature etection & Isolation:Â Identifies and isolates signatures from other text, outputting them within aÂ
<signature>Â tag. This is crucial for processing legal and business documents. - Watermark Extraction:Â etects and extracts watermark text from documents, placing it within aÂ
<watermark>Â tag. - Smart Checkbox Handling:Â Converts form checkboxes and radio buttons into standardized Unicode symbols (
â,Ââ,Ââ) for consistent and reliable processing. - Complex Table Extraction:Â Accurately extracts complex tables from documents and converts them into both markdown and HTML table formats.
- Flow charts & Organisational charts: Extracts flow charts and organisational as mermaid code.
- Handwritten ocuments:Â The model is trained on handwritten documents across multiple languages.
- Multilingual:Â Model is trained on documents of multiple languages, including English, Chinese, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Arabic, and many more.
- Visual Question Answering (VQA):Â The model is designed to provide the answer directly if it is present in the document; otherwise, it responds with "Not mentioned."
đ¤ Huggingface models
ocument with complex checkboxes
Quarterly Report (Please use the Markdown(Financial ocs) for best result in docstrange demo)
Feel free to try it out and share your feedback.