Settings API¶
Endpoints for managing conversion settings.
Get All Settings¶
Response¶
{
"ocr": {
"enabled": true,
"language": "en",
"force_full_page_ocr": false,
"backend": "easyocr",
"use_gpu": false,
"confidence_threshold": 0.5,
"bitmap_area_threshold": 0.05
},
"tables": {
"enabled": true,
"structure_extraction": true,
"mode": "accurate",
"do_cell_matching": true
},
"images": {
"extract": true,
"classify": true,
"generate_page_images": false,
"generate_picture_images": true,
"generate_table_images": true,
"images_scale": 1.0
},
"enrichment": {
"code_enrichment": false,
"formula_enrichment": false,
"picture_classification": false,
"picture_description": false
},
"output": {
"default_format": "markdown"
},
"performance": {
"device": "auto",
"num_threads": 4,
"document_timeout": null
},
"chunking": {
"enabled": false,
"max_tokens": 512,
"merge_peers": true
}
}
Update Settings¶
Request Body¶
Response¶
Returns the updated settings object.
Reset Settings to Defaults¶
Response¶
Returns the default settings object.
Get Supported Formats¶
Response¶
{
"input_formats": [
{"id": "pdf", "name": "PDF Document", "extensions": [".pdf"], "icon": "document"},
{"id": "docx", "name": "Microsoft Word", "extensions": [".docx"], "icon": "document"},
{"id": "image", "name": "Image", "extensions": [".png", ".jpg", ".jpeg", ".tiff"], "icon": "image"}
],
"output_formats": [
{"id": "markdown", "name": "Markdown", "extension": ".md", "mime_type": "text/markdown"},
{"id": "html", "name": "HTML", "extension": ".html", "mime_type": "text/html"},
{"id": "json", "name": "JSON", "extension": ".json", "mime_type": "application/json"}
]
}
OCR Settings¶
Get OCR Settings¶
Update OCR Settings¶
Query Parameters:
| Parameter | Type | Description |
|---|---|---|
auto_install | boolean | If true, automatically install pip-installable backends |
Response/Request¶
{
"ocr": {
"enabled": true,
"language": "en",
"force_full_page_ocr": false,
"backend": "easyocr",
"use_gpu": false,
"confidence_threshold": 0.5,
"bitmap_area_threshold": 0.05
},
"available_languages": [
{"code": "en", "name": "English"},
{"code": "de", "name": "German"},
{"code": "fr", "name": "French"}
],
"available_backends": [
{"id": "easyocr", "name": "EasyOCR", "description": "General-purpose OCR with GPU support"},
{"id": "tesseract", "name": "Tesseract", "description": "Classic OCR engine"},
{"id": "ocrmac", "name": "macOS Vision", "description": "Native macOS OCR (Mac only)"},
{"id": "rapidocr", "name": "RapidOCR", "description": "Fast OCR with ONNX runtime"}
]
}
OCR Backend Management¶
Get All Backend Status¶
Returns installation status for all OCR backends.
Response¶
{
"backends": [
{
"id": "easyocr",
"name": "EasyOCR",
"description": "General-purpose OCR with GPU support",
"installed": true,
"available": true,
"error": null,
"pip_installable": true,
"requires_system_install": false,
"platform": null,
"note": "First run will download language models (~100MB per language)"
},
{
"id": "tesseract",
"name": "Tesseract",
"description": "Classic OCR engine",
"installed": false,
"available": false,
"error": "Package not installed",
"pip_installable": true,
"requires_system_install": true,
"platform": null,
"note": "Requires Tesseract to be installed on your system"
}
],
"current_platform": "darwin"
}
Check Specific Backend¶
Response¶
{
"backend": "easyocr",
"installed": true,
"available": true,
"error": null,
"pip_installable": true,
"requires_system_install": false,
"note": "First run will download language models"
}
Install Backend¶
Installs a pip-installable OCR backend.
Response (Success)¶
{
"message": "Successfully installed easyocr",
"success": true,
"installed": true,
"available": true,
"note": "First run will download language models"
}
Response (Already Installed)¶
Response (Requires System Install)¶
{
"message": "Failed to install tesseract",
"success": false,
"error": "tesseract requires system-level installation",
"requires_system_install": true
}
Table Settings¶
Get Table Settings¶
Update Table Settings¶
Request/Response¶
{
"tables": {
"enabled": true,
"structure_extraction": true,
"mode": "accurate",
"do_cell_matching": true
}
}
Image Settings¶
Get Image Settings¶
Update Image Settings¶
Request/Response¶
{
"images": {
"extract": true,
"classify": true,
"generate_page_images": false,
"generate_picture_images": true,
"generate_table_images": true,
"images_scale": 1.0
}
}
Enrichment Settings¶
Get Enrichment Settings¶
Response¶
{
"enrichment": {
"code_enrichment": false,
"formula_enrichment": false,
"picture_classification": false,
"picture_description": false
},
"options": {
"code_enrichment": {
"description": "Enhance code blocks with language detection and syntax highlighting",
"default": false,
"note": "May increase processing time"
},
"formula_enrichment": {
"description": "Extract LaTeX representations from mathematical formulas",
"default": false,
"note": "Enables better formula rendering in exports"
},
"picture_classification": {
"description": "Classify images by type (figure, chart, diagram, photo, etc.)",
"default": false,
"note": "Adds semantic tags to extracted images"
},
"picture_description": {
"description": "Generate descriptive captions for images using AI vision models",
"default": false,
"note": "Requires additional model download, significantly increases processing time"
}
}
}
Update Enrichment Settings¶
Request¶
Response¶
{
"message": "Enrichment settings updated",
"enrichment": {
"code_enrichment": true,
"formula_enrichment": true,
"picture_classification": false,
"picture_description": false
}
}
| Field | Type | Description |
|---|---|---|
code_enrichment | boolean | Enhance code blocks with language detection |
formula_enrichment | boolean | Extract LaTeX from mathematical formulas |
picture_classification | boolean | Classify images by semantic type |
picture_description | boolean | Generate AI captions for images |
Processing Time
Enabling formula_enrichment and especially picture_description can significantly increase document processing time.