Document Parser API
Welcome to the Document Parser API documentation. Our service provides powerful document parsing capabilities supporting multiple formats.
Quick Start
1. Get an API Key
First, create an API key in the dashboard:
- Log into your account
- Navigate to API Keys Management
- Click “Create New Key”
- Copy the generated key (it is shown only once)
2. Authentication
Include the following header in every request:
Authorization: Bearer YOUR_API_KEYAPI Endpoints
Base URL
Production: https://mineru.net/api/v4
Test: https://mineru.net/api/v4Single File Parsing
Create Parsing Task
Create a parsing task for a single file URL.
Endpoint: POST /extract/task
Interface Description
- Apply for a Token before calling the API.
- Max file size 200 MB, max 600 pages.
- Direct file uploads are not supported; provide an accessible URL.
- Include
Authorization: Bearer <Token>in the header.
Request Parameters
| Parameter | Type | Required | Example | Description |
|---|---|---|---|---|
| url | string | Yes | https://cdn-mineru.openxlab.org.cn/demo/example.pdf | File URL, supports .pdf, .doc(x), .ppt(x), .png, .jpg, .jpeg |
| is_ocr | bool | No | false | Enable OCR (pipeline only), default false |
| enable_formula | bool | No | true | Enable formula recognition (pipeline only) |
| enable_table | bool | No | true | Enable table recognition (pipeline only) |
| language | string | No | ch | Document language (pipeline only). See supported list |
| data_id | string | No | doc_123 | Business identifier, max 128 chars |
| callback | string | No | https://example.com/callback | Result webhook URL (HTTP/HTTPS, POST JSON) |
| seed | string | No | abc123 | Required when callback is set; used to verify checksum |
| extra_formats | string[] | No | ["docx","html"] | Additional export formats (docx/html/latex) |
| page_ranges | string | No | 1-10,15 | Comma-separated pages, support ranges like 2--2 |
| model_version | string | No | vlm | pipeline (default) or vlm |
Python Example
import requests
token = "YOUR_API_TOKEN"
url = "https://mineru.net/api/v4/extract/task"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {token}"
}
payload = {
"url": "https://cdn-mineru.openxlab.org.cn/demo/example.pdf",
"model_version": "vlm"
}
res = requests.post(url, headers=headers, json=payload)
print(res.status_code)
print(res.json())
print(res.json()["data"])cURL Example
curl --location --request POST 'https://mineru.net/api/v4/extract/task' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data-raw '{
"url": "https://cdn-mineru.openxlab.org.cn/demo/example.pdf",
"model_version": "vlm"
}'Request Body Example
{
"url": "https://static.openxlab.org.cn/opendatalab/pdf/demo.pdf",
"model_version": "vlm",
"data_id": "abcd"
}Response
| Field | Type | Example | Description |
|---|---|---|---|
| code | int | 0 | API status code, 0 means success |
| msg | string | ok | Response message |
| trace_id | string | c876cd60b202f2396de1f9e39a1b0172 | Request ID |
| data.task_id | string | a90e6ab6-44f3-4554-b459-b62fe4c6b436 | Task ID |
{
"code": 0,
"data": {
"task_id": "a90e6ab6-44f3-4554-b4***"
},
"msg": "ok",
"trace_id": "c876cd60b202f2396de1f9e39a1b0172"
}Get Task Results
Endpoint: GET /extract/task/{task_id}
Description
- Query task status and result by
task_id. - Include
Authorizationheader.
Python Example
import requests
token = "YOUR_API_TOKEN"
task_id = "a90e6ab6-44f3-4554-b459-b62fe4c6b436"
url = f"https://mineru.net/api/v4/extract/task/{task_id}"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {token}"
}
res = requests.get(url, headers=headers)
print(res.status_code)
print(res.json())
print(res.json()["data"])cURL Example
curl --location --request GET 'https://mineru.net/api/v4/extract/task/{task_id}' \
--header 'Authorization: Bearer YOUR_API_KEY'Response Fields
| Field | Type | Example | Description |
|---|---|---|---|
| code | int | 0 | API status code |
| msg | string | ok | Response message |
| trace_id | string | c876cd60b202f2396de1f9e39a1b0172 | Request ID |
| data.task_id | string | abc** | Task ID |
| data.data_id | string | abc** | Copy of the data_id you sent |
| data.state | string | done | pending, running, converting, done, failed |
| data.full_zip_url | string | https://cdn-mineru...zip | Download URL of extracted files |
| data.err_msg | string | ... | Error reason when failed |
| data.extract_progress.* | object | Progress info when running |
{
"code": 0,
"data": {
"task_id": "47726b6e-46ca-4bb9-******",
"state": "running",
"err_msg": "",
"extract_progress": {
"extracted_pages": 1,
"total_pages": 2,
"start_time": "2025-01-20 11:43:20"
}
},
"msg": "ok",
"trace_id": "c876cd60b202f2396de1f9e39a1b0172"
}Batch File Parsing
Handle multiple documents with the same flow and response format as the single-file API.
URL Batch Submission
Submit multiple URL tasks in one request (up to 200).
Endpoint: POST http://8.148.69.123:8088/api/v1/extract/task/batch
Request structure is the same as single-task API but uses files[] array with url, data_id, is_ocr, page_ranges, etc. Response contains batch_id.
Python Example
import requests
token = "YOUR_API_TOKEN"
url = "http://8.148.69.123:8088/api/v1/extract/task/batch"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {token}"
}
payload = {
"files": [
{"url": "https://cdn-mineru.openxlab.org.cn/demo/example.pdf", "data_id": "abcd"}
],
"model_version": "vlm"
}
res = requests.post(url, headers=headers, json=payload)
print(res.status_code)
print(res.json())cURL Example
curl --location --request POST 'http://8.148.69.123:8088/api/v1/extract/task/batch' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data-raw '{
"files": [
{"url": "https://cdn-mineru.openxlab.org.cn/demo/example.pdf", "data_id": "abcd"}
],
"model_version": "vlm"
}'Batch Task Results
Query the progress/result of a batch.
Endpoint: GET http://8.148.69.123:8088/api/v1/extract-results/batch/{batch_id}
Python Example
import requests
token = "YOUR_API_TOKEN"
batch_id = "2bb2f0ec-a336-4a0a-b61a-241afaf9cc87"
url = f"http://8.148.69.123:8088/api/v1/extract-results/batch/{batch_id}"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {token}"
}
res = requests.get(url, headers=headers)
print(res.status_code)
print(res.json())cURL Example
curl --location --request GET 'http://8.148.69.123:8088/api/v1/extract-results/batch/{batch_id}' \
--header 'Authorization: Bearer YOUR_API_KEY'Response Fields
| Field | Type | Description |
|---|---|---|
| data.batch_id | string | Batch ID |
| data.extract_result[].file_name | string | File name |
| data.extract_result[].state | string | waiting-file, pending, running, converting, done, failed |
| data.extract_result[].full_zip_url | string | Download link when done |
| data.extract_result[].err_msg | string | Failure reason |
| data.extract_result[].data_id | string | Business identifier |
| data.extract_result[].extract_progress | object | Running progress |
{
"code": 0,
"data": {
"batch_id": "2bb2f0ec-a336-4a0a-b61a-241afaf9cc87",
"extract_result": [
{
"file_name": "example.pdf",
"state": "done",
"err_msg": "",
"full_zip_url": "https://cdn-mineru.openxlab.org.cn/pdf/018e53ad-d4f1-475d-b380-36bf24db9914.zip"
},
{
"file_name": "demo.pdf",
"state": "running",
"err_msg": "",
"extract_progress": {
"extracted_pages": 1,
"total_pages": 2,
"start_time": "2025-01-20 11:43:20"
}
}
]
},
"msg": "ok",
"trace_id": "c876cd60b202f2396de1f9e39a1b0172"
}Task Status
| Status | Description |
|---|---|
| pending | Queued |
| running | Parsing in progress |
| converting | Format converting |
| done | Completed |
| failed | Failed |
Common Error Codes
| Code | Description | Suggested Fix |
|---|---|---|
| A0202 | Token error | Check Token / Bearer prefix |
| A0211 | Token expired | Request a new Token |
| -500 | Parameter error | Verify payload and headers |
| -10001 | Service exception | Retry later |
| -10002 | Request parameter error | Validate payload format |
| -60001 | Failed to generate upload URL | Retry |
| -60002 | Unable to detect file format | Ensure file name/extension is valid |
| -60003 | File read failed | Re-upload |
| -60004 | Empty file | Upload a valid file |
| -60005 | File size exceeds 200 MB | Compress or split |
| -60006 | Page count exceeds 600 | Split document |
| -60007 | Model service unavailable | Retry/contact support |
| -60008 | File read timeout | Ensure URL accessibility |
| -60009 | Submission queue full | Retry later |
| -60010 | Parsing failed | Retry later |
| -60011 | File not found | Ensure upload completed |
| -60012 | Task not found | Verify task_id |
| -60013 | No permission | Only submitter can query |
| -60014 | Cannot delete running task | Wait for completion |
| -60015 | Conversion failed | Convert to PDF manually |
| -60016 | Export format conversion failed | Try other formats |
| -60017 | Page quota exceeded | Upgrade plan |
Limits
- Max single file size: 200 MB
- Max pages per file: 600
- Supported formats:
.pdf,.doc,.docx,.ppt,.pptx,.png,.jpg,.jpeg
