LogoParse4ai Docs
API Reference

Document Parser API

Welcome to the Document Parser API documentation. Our service provides powerful document parsing capabilities supporting multiple formats.

Quick Start

1. Get an API Key

First, create an API key in the dashboard:

  1. Log into your account
  2. Navigate to API Keys Management
  3. Click “Create New Key”
  4. Copy the generated key (it is shown only once)

2. Authentication

Include the following header in every request:

Authorization: Bearer YOUR_API_KEY

API Endpoints

Base URL

Production: https://mineru.net/api/v4
Test: https://mineru.net/api/v4

Single File Parsing

Create Parsing Task

Create a parsing task for a single file URL.

Endpoint: POST /extract/task

Interface Description

  • Apply for a Token before calling the API.
  • Max file size 200 MB, max 600 pages.
  • Direct file uploads are not supported; provide an accessible URL.
  • Include Authorization: Bearer <Token> in the header.

Request Parameters

ParameterTypeRequiredExampleDescription
urlstringYeshttps://cdn-mineru.openxlab.org.cn/demo/example.pdfFile URL, supports .pdf, .doc(x), .ppt(x), .png, .jpg, .jpeg
is_ocrboolNofalseEnable OCR (pipeline only), default false
enable_formulaboolNotrueEnable formula recognition (pipeline only)
enable_tableboolNotrueEnable table recognition (pipeline only)
languagestringNochDocument language (pipeline only). See supported list
data_idstringNodoc_123Business identifier, max 128 chars
callbackstringNohttps://example.com/callbackResult webhook URL (HTTP/HTTPS, POST JSON)
seedstringNoabc123Required when callback is set; used to verify checksum
extra_formatsstring[]No["docx","html"]Additional export formats (docx/html/latex)
page_rangesstringNo1-10,15Comma-separated pages, support ranges like 2--2
model_versionstringNovlmpipeline (default) or vlm

Python Example

import requests

token = "YOUR_API_TOKEN"
url = "https://mineru.net/api/v4/extract/task"
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {token}"
}
payload = {
    "url": "https://cdn-mineru.openxlab.org.cn/demo/example.pdf",
    "model_version": "vlm"
}

res = requests.post(url, headers=headers, json=payload)
print(res.status_code)
print(res.json())
print(res.json()["data"])

cURL Example

curl --location --request POST 'https://mineru.net/api/v4/extract/task' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data-raw '{
  "url": "https://cdn-mineru.openxlab.org.cn/demo/example.pdf",
  "model_version": "vlm"
}'

Request Body Example

{
  "url": "https://static.openxlab.org.cn/opendatalab/pdf/demo.pdf",
  "model_version": "vlm",
  "data_id": "abcd"
}

Response

FieldTypeExampleDescription
codeint0API status code, 0 means success
msgstringokResponse message
trace_idstringc876cd60b202f2396de1f9e39a1b0172Request ID
data.task_idstringa90e6ab6-44f3-4554-b459-b62fe4c6b436Task ID
{
  "code": 0,
  "data": {
    "task_id": "a90e6ab6-44f3-4554-b4***"
  },
  "msg": "ok",
  "trace_id": "c876cd60b202f2396de1f9e39a1b0172"
}

Get Task Results

Endpoint: GET /extract/task/{task_id}

Description

  • Query task status and result by task_id.
  • Include Authorization header.

Python Example

import requests

token = "YOUR_API_TOKEN"
task_id = "a90e6ab6-44f3-4554-b459-b62fe4c6b436"
url = f"https://mineru.net/api/v4/extract/task/{task_id}"
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {token}"
}

res = requests.get(url, headers=headers)
print(res.status_code)
print(res.json())
print(res.json()["data"])

cURL Example

curl --location --request GET 'https://mineru.net/api/v4/extract/task/{task_id}' \
--header 'Authorization: Bearer YOUR_API_KEY'

Response Fields

FieldTypeExampleDescription
codeint0API status code
msgstringokResponse message
trace_idstringc876cd60b202f2396de1f9e39a1b0172Request ID
data.task_idstringabc**Task ID
data.data_idstringabc**Copy of the data_id you sent
data.statestringdonepending, running, converting, done, failed
data.full_zip_urlstringhttps://cdn-mineru...zipDownload URL of extracted files
data.err_msgstring...Error reason when failed
data.extract_progress.*objectProgress info when running
{
  "code": 0,
  "data": {
    "task_id": "47726b6e-46ca-4bb9-******",
    "state": "running",
    "err_msg": "",
    "extract_progress": {
      "extracted_pages": 1,
      "total_pages": 2,
      "start_time": "2025-01-20 11:43:20"
    }
  },
  "msg": "ok",
  "trace_id": "c876cd60b202f2396de1f9e39a1b0172"
}

Batch File Parsing

Handle multiple documents with the same flow and response format as the single-file API.

URL Batch Submission

Submit multiple URL tasks in one request (up to 200).

Endpoint: POST http://8.148.69.123:8088/api/v1/extract/task/batch

Request structure is the same as single-task API but uses files[] array with url, data_id, is_ocr, page_ranges, etc. Response contains batch_id.

Python Example

import requests

token = "YOUR_API_TOKEN"
url = "http://8.148.69.123:8088/api/v1/extract/task/batch"
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {token}"
}
payload = {
    "files": [
        {"url": "https://cdn-mineru.openxlab.org.cn/demo/example.pdf", "data_id": "abcd"}
    ],
    "model_version": "vlm"
}

res = requests.post(url, headers=headers, json=payload)
print(res.status_code)
print(res.json())

cURL Example

curl --location --request POST 'http://8.148.69.123:8088/api/v1/extract/task/batch' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data-raw '{
  "files": [
    {"url": "https://cdn-mineru.openxlab.org.cn/demo/example.pdf", "data_id": "abcd"}
  ],
  "model_version": "vlm"
}'

Batch Task Results

Query the progress/result of a batch.

Endpoint: GET http://8.148.69.123:8088/api/v1/extract-results/batch/{batch_id}

Python Example

import requests

token = "YOUR_API_TOKEN"
batch_id = "2bb2f0ec-a336-4a0a-b61a-241afaf9cc87"
url = f"http://8.148.69.123:8088/api/v1/extract-results/batch/{batch_id}"
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {token}"
}

res = requests.get(url, headers=headers)
print(res.status_code)
print(res.json())

cURL Example

curl --location --request GET 'http://8.148.69.123:8088/api/v1/extract-results/batch/{batch_id}' \
--header 'Authorization: Bearer YOUR_API_KEY'

Response Fields

FieldTypeDescription
data.batch_idstringBatch ID
data.extract_result[].file_namestringFile name
data.extract_result[].statestringwaiting-file, pending, running, converting, done, failed
data.extract_result[].full_zip_urlstringDownload link when done
data.extract_result[].err_msgstringFailure reason
data.extract_result[].data_idstringBusiness identifier
data.extract_result[].extract_progressobjectRunning progress
{
  "code": 0,
  "data": {
    "batch_id": "2bb2f0ec-a336-4a0a-b61a-241afaf9cc87",
    "extract_result": [
      {
        "file_name": "example.pdf",
        "state": "done",
        "err_msg": "",
        "full_zip_url": "https://cdn-mineru.openxlab.org.cn/pdf/018e53ad-d4f1-475d-b380-36bf24db9914.zip"
      },
      {
        "file_name": "demo.pdf",
        "state": "running",
        "err_msg": "",
        "extract_progress": {
          "extracted_pages": 1,
          "total_pages": 2,
          "start_time": "2025-01-20 11:43:20"
        }
      }
    ]
  },
  "msg": "ok",
  "trace_id": "c876cd60b202f2396de1f9e39a1b0172"
}

Task Status

StatusDescription
pendingQueued
runningParsing in progress
convertingFormat converting
doneCompleted
failedFailed

Common Error Codes

CodeDescriptionSuggested Fix
A0202Token errorCheck Token / Bearer prefix
A0211Token expiredRequest a new Token
-500Parameter errorVerify payload and headers
-10001Service exceptionRetry later
-10002Request parameter errorValidate payload format
-60001Failed to generate upload URLRetry
-60002Unable to detect file formatEnsure file name/extension is valid
-60003File read failedRe-upload
-60004Empty fileUpload a valid file
-60005File size exceeds 200 MBCompress or split
-60006Page count exceeds 600Split document
-60007Model service unavailableRetry/contact support
-60008File read timeoutEnsure URL accessibility
-60009Submission queue fullRetry later
-60010Parsing failedRetry later
-60011File not foundEnsure upload completed
-60012Task not foundVerify task_id
-60013No permissionOnly submitter can query
-60014Cannot delete running taskWait for completion
-60015Conversion failedConvert to PDF manually
-60016Export format conversion failedTry other formats
-60017Page quota exceededUpgrade plan

Limits

  • Max single file size: 200 MB
  • Max pages per file: 600
  • Supported formats: .pdf, .doc, .docx, .ppt, .pptx, .png, .jpg, .jpeg