Web Analytics Made Easy - Statcounter
Skip to content

Duckling

A modern, user-friendly graphical interface for Docling - the powerful document conversion library by IBM.

Duckling Screenshot

Overview

Duckling provides an intuitive web interface for converting documents using IBM's Docling library. Whether you need to extract text from PDFs, convert Word documents to Markdown, or perform OCR on scanned images, Duckling makes it simple.

Key Features

  • Drag-and-Drop Upload


    Simply drag your documents onto the interface for instant processing

  • Batch Processing


    Convert multiple files at once with parallel processing

  • Multi-Format Support


    PDFs, Word docs, PowerPoints, Excel files, HTML, Markdown, images, and more

  • Multiple Export Formats


    Export to Markdown, HTML, JSON, DocTags, Document Tokens, RAG Chunks, or plain text

  • Image & Table Extraction


    Extract embedded images and tables with CSV export

  • RAG-Ready Chunking


    Generate document chunks optimized for RAG applications

  • Advanced OCR


    Multiple OCR backends with GPU acceleration support

  • Conversion History


    Access previously converted documents anytime

Quick Start

Get started in minutes:

# Clone the repository
git clone https://github.com/davidgs/duckling.git
cd duckling

# Backend setup
cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python app.py

# Frontend setup (new terminal)
cd frontend
npm install
npm run dev
docker-compose up --build

Access the application at http://localhost:3000

Supported Formats

Input Formats

Format Extensions Description
PDF .pdf Portable Document Format
Word .docx Microsoft Word documents
PowerPoint .pptx Microsoft PowerPoint presentations
Excel .xlsx Microsoft Excel spreadsheets
HTML .html, .htm Web pages
Markdown .md, .markdown Markdown files
Images .png, .jpg, .jpeg, .tiff, .gif, .webp, .bmp Direct image OCR
AsciiDoc .asciidoc, .adoc Technical documentation
PubMed XML .xml Scientific articles
USPTO XML .xml Patent documents

Export Formats

Format Extension Description
Markdown .md Formatted text with headers, lists, links
HTML .html Web-ready format with styling
JSON .json Full document structure
Plain Text .txt Simple text without formatting
DocTags .doctags Tagged document format
Document Tokens .tokens.json Token-level representation
RAG Chunks .chunks.json Chunks for RAG applications

Architecture

graph LR
    A[Browser] --> B[React Frontend]
    B --> C[Flask Backend]
    C --> D[Docling Engine]
    D --> E[(Storage)]

    style A fill:#3b82f6,color:#fff
    style B fill:#1e3a5f,color:#fff
    style C fill:#14b8a6,color:#fff
    style D fill:#8b5cf6,color:#fff
    style E fill:#f59e0b,color:#fff

Documentation

Acknowledgments