Configuration Guide¶
Complete reference for all Duckling configuration options.
Environment Variables¶
Create a .env file in the backend directory:
# Flask Configuration
FLASK_ENV=development # development | production | testing
SECRET_KEY=your-secret-key # Required for production
DEBUG=True # Enable debug mode
# File Handling
MAX_CONTENT_LENGTH=104857600 # Max upload size in bytes (100MB default)
# Database (optional - defaults to SQLite)
DATABASE_URL=sqlite:///history.db
Production Environment¶
FLASK_ENV=production
SECRET_KEY=your-very-secure-random-key-here
DEBUG=False
MAX_CONTENT_LENGTH=209715200 # 200MB for production
Security Warning
Never use the default SECRET_KEY in production. Generate a secure random key.
OCR Settings¶
OCR (Optical Character Recognition) extracts text from images and scanned documents.
Configuration Options¶
| Setting | Type | Default | Description |
|---|---|---|---|
enabled | boolean | true | Enable/disable OCR processing |
backend | string | "easyocr" | OCR engine to use |
language | string | "en" | Primary language for recognition |
force_full_page_ocr | boolean | false | OCR entire page vs detected regions |
use_gpu | boolean | false | Enable GPU acceleration (EasyOCR only) |
confidence_threshold | float | 0.5 | Minimum confidence for results (0-1) |
bitmap_area_threshold | float | 0.05 | Min area ratio for bitmap OCR (0-1) |
OCR Backends¶
Best for multi-language documents with accuracy requirements.
- GPU Support: Yes (CUDA)
- Languages: 80+
- Note: May have initialization issues on some systems
Classic, reliable OCR engine for simple documents.
- GPU Support: No
- Languages: 100+
- Requires: Tesseract installed on system
Native macOS OCR using Apple's Vision framework.
- GPU Support: Uses Apple Neural Engine
- Requires: macOS 10.15+
Supported Languages¶
| Code | Language | Code | Language |
|---|---|---|---|
en | English | ja | Japanese |
de | German | zh | Chinese (Simplified) |
fr | French | zh-tw | Chinese (Traditional) |
es | Spanish | ko | Korean |
it | Italian | ar | Arabic |
pt | Portuguese | hi | Hindi |
nl | Dutch | th | Thai |
pl | Polish | vi | Vietnamese |
ru | Russian | tr | Turkish |
Table Settings¶
Configure how tables are detected and extracted from documents.
Configuration Options¶
| Setting | Type | Default | Description |
|---|---|---|---|
enabled | boolean | true | Enable table detection |
structure_extraction | boolean | true | Preserve table structure |
mode | string | "accurate" | Detection mode |
do_cell_matching | boolean | true | Match cell content to structure |
Detection Modes¶
- Higher precision table detection
- Better cell boundary recognition
- Slower processing
- Recommended for complex tables
Image Settings¶
Configure image extraction and processing.
Configuration Options¶
| Setting | Type | Default | Description |
|---|---|---|---|
extract | boolean | true | Extract embedded images |
classify | boolean | true | Classify and tag images |
generate_page_images | boolean | false | Create images of each page |
generate_picture_images | boolean | true | Extract pictures as files |
generate_table_images | boolean | true | Extract tables as images |
images_scale | float | 1.0 | Scale factor for images (0.1-4.0) |
Example Configurations¶
Performance Settings¶
Optimize processing speed and resource usage.
Configuration Options¶
| Setting | Type | Default | Description |
|---|---|---|---|
device | string | "auto" | Processing device |
num_threads | int | 4 | CPU threads (1-32) |
document_timeout | int/null | null | Max processing time in seconds |
Device Options¶
| Device | Description | Best For |
|---|---|---|
auto | Automatically select best device | General use |
cpu | Force CPU processing | Servers without GPU |
cuda | NVIDIA GPU acceleration | Linux/Windows with NVIDIA GPU |
mps | Apple Metal Performance Shaders | macOS with Apple Silicon |
Example Configurations¶
Chunking Settings¶
Configure document chunking for RAG applications.
Configuration Options¶
| Setting | Type | Default | Description |
|---|---|---|---|
enabled | boolean | false | Enable document chunking |
max_tokens | int | 512 | Maximum tokens per chunk |
merge_peers | boolean | true | Merge undersized chunks |
Example Configurations¶
Output Settings¶
Configure default output format.
| Setting | Type | Default | Description |
|---|---|---|---|
default_format | string | "markdown" | Default export format |
Complete Configuration Example¶
{
"ocr": {
"enabled": true,
"backend": "easyocr",
"language": "en",
"force_full_page_ocr": false,
"use_gpu": false,
"confidence_threshold": 0.5,
"bitmap_area_threshold": 0.05
},
"tables": {
"enabled": true,
"structure_extraction": true,
"mode": "accurate",
"do_cell_matching": true
},
"images": {
"extract": true,
"classify": true,
"generate_page_images": false,
"generate_picture_images": true,
"generate_table_images": true,
"images_scale": 1.0
},
"performance": {
"device": "auto",
"num_threads": 4,
"document_timeout": null
},
"chunking": {
"enabled": false,
"max_tokens": 512,
"merge_peers": true
},
"output": {
"default_format": "markdown"
}
}
Configuration via API¶
Get Current Settings¶
Update Settings¶
curl -X PUT http://localhost:5001/api/settings \
-H "Content-Type: application/json" \
-d '{
"ocr": {"backend": "tesseract"},
"performance": {"num_threads": 8}
}'
Reset to Defaults¶
Troubleshooting¶
OCR Not Working¶
- EasyOCR initialization error: Switch to
ocrmac(macOS) ortesseract - GPU errors: Set
use_gpu: false - Low confidence results: Lower
confidence_threshold
Slow Processing¶
- Reduce
images_scaleto0.5 - Use
mode: "fast"for tables - Disable
generate_page_images - Increase
num_threads
Memory Issues¶
- Enable
document_timeout(e.g., 120 seconds) - Process fewer files in batch
- Reduce
images_scale - Disable chunking if not needed