Effortlessly Extract Text from Images with qr2ai/img2txt Cognitive Actions

In today's digital landscape, extracting text from images and PDFs has become increasingly important for developers looking to incorporate Optical Character Recognition (OCR) capabilities into their applications. The qr2ai/img2txt cognitive actions provide a powerful and easy-to-use solution for text extraction, supporting both English and Arabic languages. This article will guide you through the available action and how to implement it in your applications.
Prerequisites
To get started with the Cognitive Actions from qr2ai/img2txt, you'll need to ensure you have:
- An API key for the Cognitive Actions platform.
- A basic understanding of making HTTP requests.
- Access to an image or PDF file from which you want to extract text.
Authentication typically involves passing your API key in the request headers to authorize your actions.
Cognitive Actions Overview
Extract Text from Image or PDF
The Extract Text from Image or PDF action leverages OCR technology to extract text from image or PDF files. This action is particularly useful for applications that require text analysis, data extraction, or document processing.
- Category: document-ocr
- Purpose: Extract text from an image or PDF file using Optical Character Recognition (OCR) in either English or Arabic.
Input
The input schema for this action requires a JSON object with the following fields:
- file (required): The URI of an image or PDF file to process.
- ocrLanguage (optional): The language to use for OCR. Supported values are "eng" (English) and "ara" (Arabic). Defaults to "eng" if not specified.
Example Input:
{
"file": "https://replicate.delivery/pbxt/LKt8pqaiZ0Lqi0FEhrekr0Y4vFGpdGcjy8EXHvheUBm9lFCq/Screenshot%202024-07-26%20at%207.39.43%E2%80%AFAM.jpg",
"ocrLanguage": "ara"
}
Output
The action returns a text string that contains the extracted content from the image or PDF. Here’s an example of what the output might look like:
Example Output:
(اللهم اغفرلي.وارحمني.وعافني. وارزقني)
من كثرت عليه الحاجات والدعوات فعليه بهذا الدعاء الجامع
سأل رجلٌ النبي#كيف أقول حين أسأل ربي. قال قل:
"اللهم اغفرلي وارحمني. وعافني, وارزقني
فإنها تجمع لك دنياك وآخرتك"
#صحيح_مسلم
أكثر منه خاصة في أوقات الإجابة.يُجمع لك سعادة الدنيا والآخرة
Conceptual Usage Example (Python)
Below is a conceptual Python code snippet demonstrating how to use the Extract Text from Image or PDF action. Replace the placeholders with your actual API key and action ID.
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "527c9283-a323-494b-b563-1c484d31a080" # Action ID for Extract Text from Image or PDF
# Construct the input payload based on the action's requirements
payload = {
"file": "https://replicate.delivery/pbxt/LKt8pqaiZ0Lqi0FEhrekr0Y4vFGpdGcjy8EXHvheUBm9lFCq/Screenshot%202024-07-26%20at%207.39.43%E2%80%AFAM.jpg",
"ocrLanguage": "ara"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this snippet, you will notice that the action ID and the input payload are structured according to the specifications. The endpoint URL and request structure are illustrative, so make sure to adjust them based on your actual setup.
Conclusion
The qr2ai/img2txt Cognitive Actions provide a streamlined method for extracting text from images and PDFs, making it easier for developers to integrate OCR capabilities into their applications. By leveraging this action, you can enhance your app's functionality, enabling it to process documents efficiently.
Explore the possibilities of OCR in your projects and consider how you can utilize these cognitive actions for your next development challenge!