Effortlessly Caption Images in Bulk with the Batch Image Captioning Cognitive Actions

In today's digital landscape, the ability to generate meaningful captions for images is crucial for enhancing user experience across applications. The Batch Image Captioning API, part of the fofr/batch-image-captioning spec, provides powerful tools for developers to automate the captioning process using advanced AI models from OpenAI, Anthropic, and Google. This API supports bulk processing of images contained in ZIP archives while offering customizable features that can significantly streamline your workflow.
Prerequisites
Before you can start using the Batch Image Captioning Cognitive Actions, ensure you have the following:
- An API key for the Cognitive Actions platform.
- A ZIP archive containing images in supported formats (png, jpg, jpeg, webp).
- Basic knowledge of how to make API calls with JSON payloads.
For authentication, you typically pass your API key in the request headers. This ensures secure access to the services provided by the API.
Cognitive Actions Overview
Perform Batch Image Captioning
The Perform Batch Image Captioning action enables developers to generate captions for multiple images simultaneously. By leveraging various AI models, this action allows for flexibility in processing alongside options for resizing, customizable captions, and error handling. The output is returned as a ZIP file containing the generated captions, along with a CSV summary.
Input
The action requires a structured JSON payload to function correctly. Below are the details of the input schema:
- imageZipArchive (string, required): A URI pointing to the ZIP archive with images to be processed.
- model (string, optional): Specifies the AI model for captioning (default:
gpt-4o-2024-08-06). - maxDimension (integer, optional): The maximum dimension for resizing images (default:
1024). - openaiApiKey (string, optional): API key for OpenAI. Keep confidential.
- systemPrompt (string, optional): Defines the prompt for analyzing and captioning images.
- captionPrefix (string, optional): An optional prefix to add to captions.
- captionSuffix (string, optional): An optional suffix to add to captions.
- messagePrompt (string, optional): Message to initiate captioning (default:
Caption this image please). - anthropicApiKey (string, optional): API key for Anthropic. Keep confidential.
- googleGenerativeAiApiKey (string, optional): API key for Google Generative AI. Keep confidential.
- resizeImagesForCaptioning (boolean, optional): Determines if images should be resized before captioning (default:
true).
Example Input:
{
"model": "gpt-4o-2024-08-06",
"maxDimension": 1024,
"openaiApiKey": "[REDACTED]",
"systemPrompt": "Write a four sentence caption for this image. In the first sentence describe the style and type (painting, photo, etc) of the image. Describe in the remaining sentences the contents and composition of the image. Only use language that would be used to prompt a text to image model. Do not include usage. Comma separate keywords rather than using \"or\". Precise composition is important. Avoid phrases like \"conveys a sense of\" and \"capturing the\", just use the terms themselves.",
"captionPrefix": "",
"captionSuffix": "",
"messagePrompt": "Caption this image please",
"imageZipArchive": "https://replicate.delivery/pbxt/LREOQCiXFRxVaSpwt2MYMwuwiEMIuiIw8YPm7rLLGPH94f57/Archive.zip",
"resizeImagesForCaptioning": true
}
Output
The action returns a ZIP file containing the generated captions paired with the corresponding image filenames and a CSV summary of the results.
Example Output:
https://assets.cognitiveactions.com/invocations/9ffd0f34-5837-4cfe-9299-84f8f23217cd/e531edc9-dd26-4b88-8bf8-186812d21340.zip
Conceptual Usage Example (Python)
Here's a conceptual Python code snippet demonstrating how to call the Batch Image Captioning action:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "7f5fe50d-80f2-4f17-bc28-f44cc1527254" # Action ID for Perform Batch Image Captioning
# Construct the input payload based on the action's requirements
payload = {
"model": "gpt-4o-2024-08-06",
"maxDimension": 1024,
"openaiApiKey": "[REDACTED]",
"systemPrompt": "Write a four sentence caption for this image...",
"captionPrefix": "",
"captionSuffix": "",
"messagePrompt": "Caption this image please",
"imageZipArchive": "https://replicate.delivery/pbxt/LREOQCiXFRxVaSpwt2MYMwuwiEMIuiIw8YPm7rLLGPH94f57/Archive.zip",
"resizeImagesForCaptioning": True
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet, make sure to replace the placeholders with your actual API key and endpoint. The action ID and input payload are structured according to the specifications, allowing you to seamlessly integrate this capability into your applications.
Conclusion
The Batch Image Captioning Cognitive Action offers a robust solution for developers looking to automate the generation of image captions at scale. With customizable options and the power of advanced AI models, you can enhance your applications' interactivity and accessibility. Consider exploring additional use cases or integrating this action into your existing projects to fully leverage its capabilities. Happy coding!