Streamline Document Processing with cuuupid/marker Cognitive Actions

In today's fast-paced digital world, converting documents into machine-readable formats is crucial for accessibility and usability. The cuuupid/marker Cognitive Actions provide a robust solution for transforming scanned or electronic documents into markdown format using Optical Character Recognition (OCR). This API supports multiple languages, allowing developers to integrate powerful document processing capabilities into their applications quickly and efficiently.
Prerequisites
Before diving into the integration of the Cognitive Actions, you'll need to ensure you have some basic requirements in place:
- API Key: You'll need an API key to authenticate your requests. This key should be included in the headers of your HTTP requests.
- Setup: Familiarity with making HTTP requests in your chosen programming language is essential.
Authentication typically involves passing the API key in the headers of your requests, allowing you to access the Cognitive Actions securely.
Cognitive Actions Overview
Convert Documents to Markdown Fast
The Convert Documents to Markdown Fast action is designed to transform scanned or electronic documents into markdown format at high speed. This action utilizes OCR technology and supports multiple languages, making it a versatile tool for document processing.
- Category: Document OCR
Input
The action requires a structured input following the schema below:
{
"dpi": 400,
"document": "https://replicate.delivery/pbxt/K0onIKM1Wn5xTzan7ua67mqePVrRf6feas4sfTjbbAROkrcL/The%20Tell-Tale%20Heart.pdf",
"language": "English",
"enableEditor": false,
"parallelProcessingFactor": 10,
"maximumPages": 5
}
- dpi (integer, default: 400): The resolution in dots per inch used for OCR.
- document (string): The URL of the input file for OCR. Supported formats include PDF, EPUB, MOBI, XPS, and FB2.
- language (string, default: English): The language for OCR processing; options include English, Spanish, Portuguese, French, German, and Russian.
- enableEditor (boolean, default: false): Whether to enable editor mode.
- maximumPages (integer): The maximum number of pages to parse from the document.
- parallelProcessingFactor (integer, default: 1): The number of threads for parallel OCR processing.
Example Input
{
"dpi": 400,
"document": "https://replicate.delivery/pbxt/K0onIKM1Wn5xTzan7ua67mqePVrRf6feas4sfTjbbAROkrcL/The%20Tell-Tale%20Heart.pdf",
"language": "English",
"enableEditor": false,
"parallelProcessingFactor": 10
}
Output
Upon successful execution, the action returns a response structured as follows:
{
"markdown": "https://assets.cognitiveactions.com/invocations/b3d57835-7ec1-4832-843c-6f7fe6af7f81/f8c878d3-9dce-42aa-98a2-ac1c0336a90f.md",
"metadata": {
"toc": [],
"pages": 4,
"filetype": "pdf",
"language": "English",
"ocr_stats": {
"ocr_pages": 0,
"ocr_failed": 0,
"ocr_success": 0
},
"block_stats": {
"code": 0,
"table": 0,
"equations": {
"equations": 0,
"successful_ocr": 0,
"unsuccessful_ocr": 0
},
"header_footer": 0
},
"postprocess_stats": {
"edit": {}
}
}
}
- markdown (string): A URL pointing to the generated markdown file.
- metadata (object): Additional information, including OCR statistics and block statistics related to the processing.
Conceptual Usage Example (Python)
Below is a conceptual Python code snippet demonstrating how you might invoke the Convert Documents to Markdown Fast action:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "6c712781-0123-4e4d-8291-c2a520fcd6ce" # Action ID for Convert Documents to Markdown Fast
# Construct the input payload based on the action's requirements
payload = {
"dpi": 400,
"document": "https://replicate.delivery/pbxt/K0onIKM1Wn5xTzan7ua67mqePVrRf6feas4sfTjbbAROkrcL/The%20Tell-Tale%20Heart.pdf",
"language": "English",
"enableEditor": False,
"parallelProcessingFactor": 10
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this example, replace "YOUR_COGNITIVE_ACTIONS_API_KEY" with your actual API key. The input payload is structured according to the action's requirements, and the response is handled gracefully.
Conclusion
The cuuupid/marker Cognitive Actions provide a powerful way to enhance document processing capabilities in your applications. By utilizing the Convert Documents to Markdown Fast action, developers can seamlessly transform documents into a more accessible format while leveraging advanced OCR technology. Consider exploring additional use cases or integrating this action into your application to improve document handling and accessibility. Happy coding!