Unlocking Document OCR Capabilities with CudaNexus Surya Actions

In the fast-paced digital world, extracting text from documents is a crucial task that many applications face. The CudaNexus OCR Surya API provides developers with powerful Cognitive Actions to perform Optical Character Recognition (OCR) on various documents. Supporting over 90 languages with exceptional accuracy and speed, these actions allow for seamless integration into applications that require text extraction from printed documents. By utilizing these pre-built actions, developers can save time and effort while enhancing their applications' functionality.
Prerequisites
Before diving into the integration of the CudaNexus OCR Surya actions, ensure you have the following:
- An API key for accessing the CudaNexus Cognitive Actions platform.
- Basic knowledge of JSON structure and Python programming.
- Familiarity with making HTTP requests.
Authentication typically involves passing your API key in the request headers, allowing you to securely access the OCR functionalities.
Cognitive Actions Overview
Perform Document OCR with Surya
Description:
The Perform Document OCR with Surya action leverages the Surya engine to perform OCR on a variety of documents. It excels in accuracy and speed, making it suitable for extracting text from printed documents in line with specific language processing.
Category: Document OCR
Input
The input for this action requires the following fields:
- image (required): URI of the image or PDF to be uploaded. Only images in URI format are supported.
- action (optional): Specifies the processing action. Options are "Run Text Detection" or "Run OCR". Defaults to "Run Text Detection".
- pageNumber (optional): The page number to process in the document, with a default of 1.
- languagesInput (optional): Input languages for processing, specified as a comma-separated list. Default is "English".
- languagesChoices (optional): List of available languages for processing, with "English" as the default.
Example Input:
{
"image": "https://replicate.delivery/pbxt/KU3ZDwmFqwo7tsfY5m8OsN0XDJLqk2lvgSKOT5s7HFZOqkNq/D5300-1.jpg",
"action": "Run OCR",
"pageNumber": 1,
"languagesInput": "English",
"languagesChoices": "English"
}
Output
Upon successful execution, this action returns the following fields:
- image: URI of the processed image.
- Status: A message indicating the result of the OCR process.
- text_file: URI of the text file generated from the OCR process.
Example Output:
{
"image": "https://assets.cognitiveactions.com/invocations/ee724120-7068-496f-b4b6-6ab081c4d268/7b913509-216f-4727-8d08-1709f56a34b3.jpg",
"Status": "OCR completed.",
"text_file": "https://assets.cognitiveactions.com/invocations/ee724120-7068-496f-b4b6-6ab081c4d268/163637a4-59b6-48f0-8414-16ffb843f6e1.txt"
}
Conceptual Usage Example (Python)
Here’s how you might call the Perform Document OCR with Surya action using Python:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "f6c2b45f-3b5e-4c99-ad2a-5eb78db1d9c9" # Action ID for Perform Document OCR with Surya
# Construct the input payload based on the action's requirements
payload = {
"image": "https://replicate.delivery/pbxt/KU3ZDwmFqwo7tsfY5m8OsN0XDJLqk2lvgSKOT5s7HFZOqkNq/D5300-1.jpg",
"action": "Run OCR",
"pageNumber": 1,
"languagesInput": "English",
"languagesChoices": "English"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this snippet, replace "YOUR_COGNITIVE_ACTIONS_API_KEY" with your actual API key. The action ID and input payload are structured according to the requirements of the Perform Document OCR with Surya action.
Conclusion
Integrating the CudaNexus OCR Surya actions into your applications provides you with a robust solution for text extraction from documents. With the ability to support multiple languages and options for specific processing actions, these Cognitive Actions can enhance your app's capabilities significantly. Explore further use cases, such as document management systems, automated data entry, or translation services, to fully leverage the power of OCR in your projects. Happy coding!