Konvertierungs-API

Endpoints zum Hochladen und Konvertieren von Dokumenten.

Einzelnes Dokument hochladen und konvertieren

POST /api/convert
Content-Type: multipart/form-data

Parameter

Name	Typ	Erforderlich	Beschreibung
`file`	Datei	Ja	Zu konvertierendes Dokument
`settings`	JSON-String	Nein	Überschreibung der Konvertierungseinstellungen

Beispiel-Anfrage

curl -X POST http://localhost:5001/api/convert \
  -F "file=@document.pdf" \
  -F 'settings={"ocr":{"enabled":true,"language":"en"}}'

Antwort (202 Akzeptiert)

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "filename": "document.pdf",
  "input_format": "pdf",
  "status": "processing",
  "message": "Conversion started"
}

Mehrere Dokumente stapelweise konvertieren

POST /api/convert/batch
Content-Type: multipart/form-data

Parameter

Name	Typ	Erforderlich	Beschreibung
`files`	Datei[]	Ja	Zu konvertierende Dokumente
`settings`	JSON-String	Nein	Überschreibung der Konvertierungseinstellungen

Beispiel-Anfrage

curl -X POST http://localhost:5001/api/convert/batch \
  -F "files=@doc1.pdf" \
  -F "files=@doc2.pdf" \
  -F "files=@image.png"

Antwort (202 Akzeptiert)

{
  "jobs": [
    {
      "job_id": "550e8400-e29b-41d4-a716-446655440001",
      "filename": "doc1.pdf",
      "input_format": "pdf",
      "status": "processing"
    },
    {
      "job_id": "550e8400-e29b-41d4-a716-446655440002",
      "filename": "doc2.pdf",
      "input_format": "pdf",
      "status": "processing"
    },
    {
      "job_id": "550e8400-e29b-41d4-a716-446655440003",
      "filename": "image.png",
      "input_format": "image",
      "status": "processing"
    }
  ],
  "total": 3,
  "message": "Started 3 conversions"
}

Dokument von URL konvertieren

POST /api/convert/url
Content-Type: application/json

Parameter

Name	Typ	Erforderlich	Beschreibung
`url`	string	Ja	URL des zu konvertierenden Dokuments
`settings`	object	Nein	Überschreibung der Konvertierungseinstellungen

Beispiel-Anfrage

curl -X POST http://localhost:5001/api/convert/url \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/document.pdf",
    "settings": {"ocr": {"enabled": true}}
  }'

Antwort (202 Akzeptiert)

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "filename": "document.pdf",
  "source_url": "https://example.com/document.pdf",
  "input_format": "pdf",
  "status": "processing",
  "message": "Conversion started"
}

Mehrere Dokumente von URLs stapelweise konvertieren

POST /api/convert/url/batch
Content-Type: application/json

Parameter

Name	Typ	Erforderlich	Beschreibung
`urls`	string[]	Ja	Array von zu konvertierenden URLs
`settings`	object	Nein	Überschreibung der Konvertierungseinstellungen

Beispiel-Anfrage

curl -X POST http://localhost:5001/api/convert/url/batch \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com/doc1.pdf",
      "https://example.com/doc2.docx",
      "https://example.com/page.html"
    ]
  }'

Antwort (202 Akzeptiert)

{
  "jobs": [
    {
      "job_id": "550e8400-e29b-41d4-a716-446655440001",
      "url": "https://example.com/doc1.pdf",
      "filename": "doc1.pdf",
      "input_format": "pdf",
      "status": "processing"
    },
    {
      "job_id": "550e8400-e29b-41d4-a716-446655440002",
      "url": "https://example.com/doc2.docx",
      "filename": "doc2.docx",
      "input_format": "docx",
      "status": "processing"
    },
    {
      "url": "https://example.com/invalid",
      "status": "rejected",
      "error": "File type not allowed"
    }
  ],
  "total": 3,
  "message": "Started 2 conversions"
}

Konvertierungsstatus abrufen

GET /api/convert/{job_id}/status

Antwort (Verarbeitung)

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "processing",
  "progress": 45,
  "message": "Analyzing document with OCR (easyocr, en)..."
}

Antwort (Abgeschlossen)

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "progress": 100,
  "message": "Conversion completed successfully",
  "confidence": 0.92,
  "formats_available": ["markdown", "html", "json", "text", "doctags"],
  "images_count": 3,
  "tables_count": 2,
  "chunks_count": 0,
  "preview": "# Document Title\n\nFirst paragraph..."
}

Antwort (Fehlgeschlagen)

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "failed",
  "progress": 0,
  "message": "Conversion failed: Invalid PDF format",
  "error": "Invalid PDF format"
}

Konvertierungsergebnis abrufen

GET /api/convert/{job_id}/result

Antwort

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "confidence": 0.92,
  "formats_available": ["markdown", "html", "json", "text", "doctags", "document_tokens"],
  "result": {
    "markdown_preview": "# Document Title\n\nContent preview...",
    "formats_available": ["markdown", "html", "json", "text", "doctags"],
    "page_count": 5,
    "images_count": 3,
    "tables_count": 2,
    "chunks_count": 0,
    "warnings": []
  },
  "images_count": 3,
  "tables_count": 2,
  "chunks_count": 0,
  "completed_at": "2024-01-15T10:30:00Z"
}

Extrahierte Bilder abrufen

GET /api/convert/{job_id}/images

Antwort

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "images": [
    {
      "id": 1,
      "filename": "image_1.png",
      "path": "/outputs/job_id/images/image_1.png",
      "caption": "Figure 1: Architecture diagram",
      "label": "figure"
    },
    {
      "id": 2,
      "filename": "image_2.png",
      "path": "/outputs/job_id/images/image_2.png",
      "caption": "",
      "label": "picture"
    }
  ],
  "count": 2
}

Extrahierte Bilder herunterladen

GET /api/convert/{job_id}/images/{image_id}

Antwort: Binäre Bilddatei (PNG)

Extrahierte Tabellen abrufen

GET /api/convert/{job_id}/tables

Antwort

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "tables": [
    {
      "id": 1,
      "label": "table",
      "caption": "Table 1: Sales Data",
      "rows": [
        ["Product", "Q1", "Q2", "Q3", "Q4"],
        ["Widget A", "100", "150", "200", "175"]
      ],
      "csv_path": "/outputs/job_id/tables/table_1.csv",
      "image_path": "/outputs/job_id/tables/table_1.png"
    }
  ],
  "count": 1
}

Tabelle als CSV herunterladen

GET /api/convert/{job_id}/tables/{table_id}/csv

Antwort: CSV-Datei

Tabelle als Bild herunterladen

GET /api/convert/{job_id}/tables/{table_id}/image

Antwort: Binäre Bilddatei (PNG)

Dokument-Segmente abrufen

GET /api/convert/{job_id}/chunks

Antwort

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "chunks": [
    {
      "id": 1,
      "text": "This is the first chunk of text from the document...",
      "meta": {
        "headings": ["Introduction"],
        "page": 1
      }
    },
    {
      "id": 2,
      "text": "Second chunk continues the content...",
      "meta": {
        "headings": ["Introduction", "Background"],
        "page": 1
      }
    }
  ],
  "count": 2
}

Dokument exportieren

GET /api/export/{job_id}/{format}

Unterstützte Formate

markdown
html
json
text
doctags
document_tokens
chunks

Antwort: Datei-Download mit entsprechendem MIME-Typ

Job löschen

DELETE /api/convert/{job_id}

Antwort

{
  "message": "Job 550e8400-e29b-41d4-a716-446655440000 deleted",
  "job_id": "550e8400-e29b-41d4-a716-446655440000"
}