Document Parser API

Welcome to the Document Parser API documentation. Our service provides powerful document parsing capabilities supporting multiple formats.

Quick Start

1. Get an API Key

First, create an API key in the dashboard:

Log into your account
Navigate to API Keys Management
Click “Create New Key”
Copy the generated key (it is shown only once)

2. Authentication

Include the following header in every request:

Authorization: Bearer YOUR_API_KEY

API Endpoints

Base URL

Production: https://mineru.net/api/v4
Test: https://mineru.net/api/v4

Single File Parsing

Create Parsing Task

Create a parsing task for a single file URL.

Endpoint: POST /extract/task

Interface Description

Apply for a Token before calling the API.
Max file size 200 MB, max 600 pages.
Direct file uploads are not supported; provide an accessible URL.
Include Authorization: Bearer <Token> in the header.

Request Parameters

Parameter	Type	Required	Example	Description
url	string	Yes	https://cdn-mineru.openxlab.org.cn/demo/example.pdf	File URL, supports .pdf, .doc(x), .ppt(x), .png, .jpg, .jpeg
is_ocr	bool	No	false	Enable OCR (pipeline only), default false
enable_formula	bool	No	true	Enable formula recognition (pipeline only)
enable_table	bool	No	true	Enable table recognition (pipeline only)
language	string	No	ch	Document language (pipeline only). See supported list
data_id	string	No	doc_123	Business identifier, max 128 chars
callback	string	No	https://example.com/callback	Result webhook URL (HTTP/HTTPS, POST JSON)
seed	string	No	abc123	Required when callback is set; used to verify checksum
extra_formats	string[]	No	["docx","html"]	Additional export formats (docx/html/latex)
page_ranges	string	No	1-10,15	Comma-separated pages, support ranges like `2--2`
model_version	string	No	vlm	`pipeline` (default) or `vlm`

Python Example

import requests

token = "YOUR_API_TOKEN"
url = "https://mineru.net/api/v4/extract/task"
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {token}"
}
payload = {
    "url": "https://cdn-mineru.openxlab.org.cn/demo/example.pdf",
    "model_version": "vlm"
}

res = requests.post(url, headers=headers, json=payload)
print(res.status_code)
print(res.json())
print(res.json()["data"])

cURL Example

curl --location --request POST 'https://mineru.net/api/v4/extract/task' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data-raw '{
  "url": "https://cdn-mineru.openxlab.org.cn/demo/example.pdf",
  "model_version": "vlm"
}'

Request Body Example

{
  "url": "https://static.openxlab.org.cn/opendatalab/pdf/demo.pdf",
  "model_version": "vlm",
  "data_id": "abcd"
}

Response

Field	Type	Example	Description
code	int	0	API status code, 0 means success
msg	string	ok	Response message
trace_id	string	c876cd60b202f2396de1f9e39a1b0172	Request ID
data.task_id	string	a90e6ab6-44f3-4554-b459-b62fe4c6b436	Task ID

{
  "code": 0,
  "data": {
    "task_id": "a90e6ab6-44f3-4554-b4***"
  },
  "msg": "ok",
  "trace_id": "c876cd60b202f2396de1f9e39a1b0172"
}

Get Task Results

Endpoint: GET /extract/task/{task_id}

Description

Query task status and result by task_id.
Include Authorization header.

Python Example

import requests

token = "YOUR_API_TOKEN"
task_id = "a90e6ab6-44f3-4554-b459-b62fe4c6b436"
url = f"https://mineru.net/api/v4/extract/task/{task_id}"
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {token}"
}

res = requests.get(url, headers=headers)
print(res.status_code)
print(res.json())
print(res.json()["data"])

cURL Example

curl --location --request GET 'https://mineru.net/api/v4/extract/task/{task_id}' \
--header 'Authorization: Bearer YOUR_API_KEY'

Response Fields

Field	Type	Example	Description
code	int	0	API status code
msg	string	ok	Response message
trace_id	string	c876cd60b202f2396de1f9e39a1b0172	Request ID
data.task_id	string	abc**	Task ID
data.data_id	string	abc**	Copy of the `data_id` you sent
data.state	string	done	`pending`, `running`, `converting`, `done`, `failed`
data.full_zip_url	string	https://cdn-mineru...zip	Download URL of extracted files
data.err_msg	string	...	Error reason when failed
data.extract_progress.*	object		Progress info when running

{
  "code": 0,
  "data": {
    "task_id": "47726b6e-46ca-4bb9-******",
    "state": "running",
    "err_msg": "",
    "extract_progress": {
      "extracted_pages": 1,
      "total_pages": 2,
      "start_time": "2025-01-20 11:43:20"
    }
  },
  "msg": "ok",
  "trace_id": "c876cd60b202f2396de1f9e39a1b0172"
}

Batch File Parsing

Handle multiple documents with the same flow and response format as the single-file API.

URL Batch Submission

Submit multiple URL tasks in one request (up to 200).

Endpoint: POST http://8.148.69.123:8088/api/v1/extract/task/batch

Request structure is the same as single-task API but uses files[] array with url, data_id, is_ocr, page_ranges, etc. Response contains batch_id.

Python Example

import requests

token = "YOUR_API_TOKEN"
url = "http://8.148.69.123:8088/api/v1/extract/task/batch"
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {token}"
}
payload = {
    "files": [
        {"url": "https://cdn-mineru.openxlab.org.cn/demo/example.pdf", "data_id": "abcd"}
    ],
    "model_version": "vlm"
}

res = requests.post(url, headers=headers, json=payload)
print(res.status_code)
print(res.json())

cURL Example

curl --location --request POST 'http://8.148.69.123:8088/api/v1/extract/task/batch' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data-raw '{
  "files": [
    {"url": "https://cdn-mineru.openxlab.org.cn/demo/example.pdf", "data_id": "abcd"}
  ],
  "model_version": "vlm"
}'

Batch Task Results

Query the progress/result of a batch.

Endpoint: GET http://8.148.69.123:8088/api/v1/extract-results/batch/{batch_id}

Python Example

import requests

token = "YOUR_API_TOKEN"
batch_id = "2bb2f0ec-a336-4a0a-b61a-241afaf9cc87"
url = f"http://8.148.69.123:8088/api/v1/extract-results/batch/{batch_id}"
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {token}"
}

res = requests.get(url, headers=headers)
print(res.status_code)
print(res.json())

cURL Example

curl --location --request GET 'http://8.148.69.123:8088/api/v1/extract-results/batch/{batch_id}' \
--header 'Authorization: Bearer YOUR_API_KEY'

Response Fields

Field	Type	Description
data.batch_id	string	Batch ID
data.extract_result[].file_name	string	File name
data.extract_result[].state	string	`waiting-file`, `pending`, `running`, `converting`, `done`, `failed`
data.extract_result[].full_zip_url	string	Download link when done
data.extract_result[].err_msg	string	Failure reason
data.extract_result[].data_id	string	Business identifier
data.extract_result[].extract_progress	object	Running progress

{
  "code": 0,
  "data": {
    "batch_id": "2bb2f0ec-a336-4a0a-b61a-241afaf9cc87",
    "extract_result": [
      {
        "file_name": "example.pdf",
        "state": "done",
        "err_msg": "",
        "full_zip_url": "https://cdn-mineru.openxlab.org.cn/pdf/018e53ad-d4f1-475d-b380-36bf24db9914.zip"
      },
      {
        "file_name": "demo.pdf",
        "state": "running",
        "err_msg": "",
        "extract_progress": {
          "extracted_pages": 1,
          "total_pages": 2,
          "start_time": "2025-01-20 11:43:20"
        }
      }
    ]
  },
  "msg": "ok",
  "trace_id": "c876cd60b202f2396de1f9e39a1b0172"
}

Task Status

Status	Description
pending	Queued
running	Parsing in progress
converting	Format converting
done	Completed
failed	Failed

Common Error Codes

Code	Description	Suggested Fix
A0202	Token error	Check Token / Bearer prefix
A0211	Token expired	Request a new Token
-500	Parameter error	Verify payload and headers
-10001	Service exception	Retry later
-10002	Request parameter error	Validate payload format
-60001	Failed to generate upload URL	Retry
-60002	Unable to detect file format	Ensure file name/extension is valid
-60003	File read failed	Re-upload
-60004	Empty file	Upload a valid file
-60005	File size exceeds 200 MB	Compress or split
-60006	Page count exceeds 600	Split document
-60007	Model service unavailable	Retry/contact support
-60008	File read timeout	Ensure URL accessibility
-60009	Submission queue full	Retry later
-60010	Parsing failed	Retry later
-60011	File not found	Ensure upload completed
-60012	Task not found	Verify `task_id`
-60013	No permission	Only submitter can query
-60014	Cannot delete running task	Wait for completion
-60015	Conversion failed	Convert to PDF manually
-60016	Export format conversion failed	Try other formats
-60017	Page quota exceeded	Upgrade plan

Limits

Max single file size: 200 MB
Max pages per file: 600
Supported formats: .pdf, .doc, .docx, .ppt, .pptx, .png, .jpg, .jpeg

Table of Contents