Enhance Document Images with Cognitive Actions from DocEnTR

In today's digital landscape, the quality of document images can significantly impact readability and usability. The DocEnTR API provides a powerful set of Cognitive Actions designed to enhance and binarize degraded document images, making them clearer and more legible. In this blog post, we'll dive into how to utilize the Enhance Document Images action, empowering developers to seamlessly integrate image enhancement capabilities into their applications.
Prerequisites
Before you get started with the Cognitive Actions, ensure you have the following:
- An API key for the Cognitive Actions platform to authenticate your requests.
- Basic understanding of making HTTP requests and handling JSON data.
To authenticate your calls, you will typically pass your API key in the headers of your requests.
Cognitive Actions Overview
Enhance Document Images
The Enhance Document Images action utilizes the DocEnTR model to significantly improve the quality and readability of degraded document images. This action is particularly useful for applications that deal with scanned documents or images where clarity is essential.
Input
The input to this action requires a JSON object with the following schema:
{
"image": "string",
"modelSize": "string"
}
- image (required): A valid URI pointing to the input image file. For example:
"https://replicate.delivery/mgxm/7bb1c92e-0b30-4107-9a95-5b8f0040a80e/14.png"
- modelSize (optional): Specifies the model size to use. Options are
"base"and"large", with"base"as the default.
Example Input
{
"image": "https://replicate.delivery/mgxm/7bb1c92e-0b30-4107-9a95-5b8f0040a80e/14.png",
"modelSize": "base"
}
Output
Upon successful execution, this action returns a URL pointing to the enhanced image. For example:
https://assets.cognitiveactions.com/invocations/a64e4df3-6843-4167-9b8f-37cc7ae55fdc/8858d207-5941-45e9-9b42-4eff99aae060.png
This URL will lead you to the processed image, where you can view the improvements made.
Conceptual Usage Example (Python)
Here’s how you might structure a call to the Enhance Document Images action using Python:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "17fbe452-ee84-4791-bbb3-f552e60c1f8f" # Action ID for Enhance Document Images
# Construct the input payload based on the action's requirements
payload = {
"image": "https://replicate.delivery/mgxm/7bb1c92e-0b30-4107-9a95-5b8f0040a80e/14.png",
"modelSize": "base"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this example, you'll need to replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The action_id corresponds to the Enhance Document Images action. The input payload is constructed using the required fields, and you will see the results printed out once the action is executed successfully.
Conclusion
The Enhance Document Images action from the DocEnTR API is a powerful tool for improving the quality of document images in your applications. By integrating this action, you can provide users with clearer and more readable documents, enhancing their overall experience.
Consider exploring additional use cases, such as automating document processing workflows or incorporating image enhancements into data entry systems. The possibilities are vast, and the benefits are clear!