Streamline Document Conversion with cuuupid/marker Cognitive Actions

In today's fast-paced digital environment, the ability to convert documents efficiently can make a significant difference in productivity and workflow. The cuuupid/marker API offers developers a powerful Cognitive Action that leverages Optical Character Recognition (OCR) capabilities to transform scanned or electronic documents into markdown format. This action supports a variety of document types, including PDFs and EPUBs, making it an invaluable tool for anyone looking to streamline document processing.
Prerequisites
Before diving into the integration of the Cognitive Actions, you will need a few essentials:
- API Key: To access the cuuupid/marker API, you must have your unique API key. This key will be used to authenticate your requests.
- Basic Setup: Familiarity with making HTTP requests and handling JSON data is beneficial for utilizing these actions effectively.
Authentication typically involves including your API key in the request headers, allowing secure access to the Cognitive Actions platform.
Cognitive Actions Overview
Convert Documents to Markdown
The Convert Documents to Markdown action is designed to quickly and efficiently convert various document formats into markdown. This action utilizes OCR technology to extract text from documents, supporting multiple languages and formats, including PDF and EPUB.
- Category: Document Processing
Input
The input schema for this action requires several fields:
- dpi (integer): The DPI setting for OCR. Default is 400.
- document (string): The URI of the input file to be processed. Supported formats include PDF, EPUB, MOBI, XPS, and FB2.
- language (string): The language used for OCR. Default is "English". Options include Spanish, Portuguese, French, German, and Russian.
- enableEditor (boolean): Enables editing mode for adjusting OCR results. Defaults to false.
- maximumPages (integer): The maximum number of pages to process from the input document.
- parallelFactor (integer): The number of parallel processes for OCR. Default is 1.
Example Input:
{
"dpi": 400,
"document": "https://replicate.delivery/pbxt/K0onIKM1Wn5xTzan7ua67mqePVrRf6feas4sfTjbbAROkrcL/The%20Tell-Tale%20Heart.pdf",
"language": "English",
"enableEditor": false,
"parallelFactor": 10
}
Output
The output from this action typically includes:
- markdown (string): A URI to the generated markdown file.
- metadata (object): Additional information about the conversion process, including:
- toc: Table of contents.
- pages: Total pages processed.
- filetype: Type of the input file.
- language: Language used for OCR.
- ocr_stats: Statistics detailing success and failure rates of the OCR process.
- block_stats: Information about processed blocks such as code and equations.
- postprocess_stats: Information regarding any post-processing actions.
Example Output:
{
"markdown": "https://assets.cognitiveactions.com/invocations/8f7ec0c1-bec4-416f-a56b-e38512b75376/709ae884-a126-426e-88e5-9f585c18ba03.md",
"metadata": {
"toc": [],
"pages": 4,
"filetype": "pdf",
"language": "English",
"ocr_stats": {
"ocr_pages": 0,
"ocr_failed": 0,
"ocr_success": 0
},
"block_stats": {
"code": 0,
"table": 0,
"equations": {
"equations": 0,
"successful_ocr": 0,
"unsuccessful_ocr": 0
},
"header_footer": 0
},
"postprocess_stats": {
"edit": {}
}
}
}
Conceptual Usage Example (Python)
Here’s a conceptual example of how you might call this action using Python:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "75a65338-e2fb-43dd-9d1e-e7c10b7adbf6" # Action ID for Convert Documents to Markdown
# Construct the input payload based on the action's requirements
payload = {
"dpi": 400,
"document": "https://replicate.delivery/pbxt/K0onIKM1Wn5xTzan7ua67mqePVrRf6feas4sfTjbbAROkrcL/The%20Tell-Tale%20Heart.pdf",
"language": "English",
"enableEditor": False,
"parallelFactor": 10
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet, you replace "YOUR_COGNITIVE_ACTIONS_API_KEY" with your actual API key and adjust the endpoint URL as needed. The payload variable is structured according to the action's input schema, ensuring you send the correct data for processing.
Conclusion
The Convert Documents to Markdown action from the cuuupid/marker API is an excellent solution for developers looking to automate document conversion with high efficiency. By utilizing OCR capabilities, this action simplifies the process of transforming various document formats into a more accessible markdown format. Whether you are developing a content management system or need to process academic papers, integrating this Cognitive Action can greatly enhance your application's functionality. Start experimenting with it today to see the benefits firsthand!