Settings API
Endpoints for managing conversion settings.
Session-Based Storage
Settings are stored per-user session in the database. Each user's settings are isolated and don't affect other users, making Duckling safe for multi-user deployments.
Get All Settings
Response
{
"ocr": {
"enabled": true,
"language": "en",
"force_full_page_ocr": false,
"backend": "easyocr",
"use_gpu": false,
"confidence_threshold": 0.5,
"bitmap_area_threshold": 0.05
},
"tables": {
"enabled": true,
"structure_extraction": true,
"mode": "accurate",
"do_cell_matching": true
},
"images": {
"extract": true,
"classify": true,
"generate_page_images": false,
"generate_picture_images": true,
"generate_table_images": true,
"images_scale": 1.0
},
"enrichment": {
"code_enrichment": false,
"formula_enrichment": false,
"picture_classification": false,
"picture_description": false
},
"output": {
"default_format": "markdown"
},
"performance": {
"device": "auto",
"num_threads": 4,
"document_timeout": null
},
"chunking": {
"enabled": false,
"max_tokens": 512,
"merge_peers": true
}
}
Update Settings
Request Body
Response
Returns the updated settings object.
Reset Settings to Defaults
Response
Returns the default settings object.
Get Supported Formats
Response
{
"input_formats": [
{"id": "pdf", "name": "PDF Document", "extensions": [".pdf"], "icon": "document"},
{"id": "docx", "name": "Microsoft Word", "extensions": [".docx"], "icon": "document"},
{"id": "image", "name": "Image", "extensions": [".png", ".jpg", ".jpeg", ".tiff"], "icon": "image"}
],
"output_formats": [
{"id": "markdown", "name": "Markdown", "extension": ".md", "mime_type": "text/markdown"},
{"id": "html", "name": "HTML", "extension": ".html", "mime_type": "text/html"},
{"id": "json", "name": "JSON", "extension": ".json", "mime_type": "application/json"}
]
}
OCR Settings
Get OCR Settings
Update OCR Settings
Query Parameters:
| Parameter | Type | Description |
|---|---|---|
auto_install | boolean | If true, automatically install pip-installable backends |
Response/Request
{
"ocr": {
"enabled": true,
"language": "en",
"force_full_page_ocr": false,
"backend": "easyocr",
"use_gpu": false,
"confidence_threshold": 0.5,
"bitmap_area_threshold": 0.05
},
"available_languages": [
{"code": "en", "name": "English"},
{"code": "de", "name": "German"},
{"code": "fr", "name": "French"}
],
"available_backends": [
{"id": "easyocr", "name": "EasyOCR", "description": "General-purpose OCR with GPU support"},
{"id": "tesseract", "name": "Tesseract", "description": "Classic OCR engine"},
{"id": "ocrmac", "name": "macOS Vision", "description": "Native macOS OCR (Mac only)"},
{"id": "rapidocr", "name": "RapidOCR", "description": "Fast OCR with ONNX runtime"}
]
}
OCR Backend Management
Get All Backend Status
Returns installation status for all OCR backends.
Response
{
"backends": [
{
"id": "easyocr",
"name": "EasyOCR",
"description": "General-purpose OCR with GPU support",
"installed": true,
"available": true,
"error": null,
"pip_installable": true,
"requires_system_install": false,
"platform": null,
"note": "First run will download language models (~100MB per language)"
},
{
"id": "tesseract",
"name": "Tesseract",
"description": "Classic OCR engine",
"installed": false,
"available": false,
"error": "Package not installed",
"pip_installable": true,
"requires_system_install": true,
"platform": null,
"note": "Requires Tesseract to be installed on your system"
}
],
"current_platform": "darwin"
}
Check Specific Backend
Response
{
"backend": "easyocr",
"installed": true,
"available": true,
"error": null,
"pip_installable": true,
"requires_system_install": false,
"note": "First run will download language models"
}
Install Backend
Installs a pip-installable OCR backend.
Response (Success)
{
"message": "Successfully installed easyocr",
"success": true,
"installed": true,
"available": true,
"note": "First run will download language models"
}
Response (Already Installed)
Response (Requires System Install)
{
"message": "Failed to install tesseract",
"success": false,
"error": "tesseract requires system-level installation",
"requires_system_install": true
}
Table Settings
Get Table Settings
Update Table Settings
Request/Response
{
"tables": {
"enabled": true,
"structure_extraction": true,
"mode": "accurate",
"do_cell_matching": true
}
}
Image Settings
Get Image Settings
Update Image Settings
Request/Response
{
"images": {
"extract": true,
"classify": true,
"generate_page_images": false,
"generate_picture_images": true,
"generate_table_images": true,
"images_scale": 1.0
}
}
Enrichment Settings
Get Enrichment Settings
Response
{
"enrichment": {
"code_enrichment": false,
"formula_enrichment": false,
"picture_classification": false,
"picture_description": false
},
"options": {
"code_enrichment": {
"description": "Enhance code blocks with language detection and syntax highlighting",
"default": false,
"note": "May increase processing time"
},
"formula_enrichment": {
"description": "Extract LaTeX representations from mathematical formulas",
"default": false,
"note": "Enables better formula rendering in exports"
},
"picture_classification": {
"description": "Classify images by type (figure, chart, diagram, photo, etc.)",
"default": false,
"note": "Adds semantic tags to extracted images"
},
"picture_description": {
"description": "Generate descriptive captions for images using AI vision models",
"default": false,
"note": "Requires additional model download, significantly increases processing time"
}
}
}
Update Enrichment Settings
Request
Response
{
"message": "Enrichment settings updated",
"enrichment": {
"code_enrichment": true,
"formula_enrichment": true,
"picture_classification": false,
"picture_description": false
}
}
| Field | Type | Description |
|---|---|---|
code_enrichment | boolean | Enhance code blocks with language detection |
formula_enrichment | boolean | Extract LaTeX from mathematical formulas |
picture_classification | boolean | Classify images by semantic type |
picture_description | boolean | Generate AI captions for images |
Processing Time
Enabling formula_enrichment and especially picture_description can significantly increase document processing time.