Enhance Your App with Image Analysis using BLIP3 Cognitive Actions

Integrating advanced capabilities into your applications has never been easier with the lucataco/blip3-phi3-mini-instruct-r-v1 API. This API provides a set of powerful Cognitive Actions, specifically designed to leverage the BLIP3 series of Large Multimodal Models developed by Salesforce AI Research. Among these actions, the ability to analyze images and generate insightful responses allows developers to create applications that can understand and interact with visual content intelligently.
Prerequisites
Before you dive into using the Cognitive Actions, ensure you have the following:
- An API key for the Cognitive Actions platform.
- Basic knowledge of sending HTTP requests and handling JSON data.
- Familiarity with Python (or your preferred programming language) for implementation.
Authentication typically involves sending your API key in the request headers, allowing you to access the actions securely.
Cognitive Actions Overview
Analyze Image with BLIP3
The Analyze Image with BLIP3 action allows you to analyze images and respond to questions using high-quality image captioning and in-context learning capabilities. This action falls under the category of image-analysis and is ideal for applications that require intelligent interaction with images.
Input
The input schema for this action requires the following fields:
- image (required): A URI pointing to the image that you want to analyze.
- question (optional): A question regarding the image. If not provided, it defaults to "how many dogs are in the picture?".
- maxNewTokens (optional): Specifies the maximum number of tokens to generate in the response, with a default value of 768 (must be between 512 and 2047).
- systemPrompt (optional): Sets the context for the interaction, guiding the AI on the desired style and tone. The default prompt is: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions".
Here's an example input payload:
{
"image": "https://replicate.delivery/pbxt/KtacIzXNav6KQBhoK4XorduzIZpxPWLjnMgayp07TPS0oS6T/blip-demo.jpg",
"question": "how many dogs are in the picture?",
"maxNewTokens": 768,
"systemPrompt": "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions"
}
Output
The action will typically return a text response based on the analysis of the image. For example:
There is one dog in the picture.
This output provides a direct answer to the question posed, demonstrating the capability of the model to understand and interpret visual content.
Conceptual Usage Example (Python)
Here's how you might call the Analyze Image with BLIP3 action using Python:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "efb7885e-7ada-4fc9-b6d7-412068899457" # Action ID for Analyze Image with BLIP3
# Construct the input payload based on the action's requirements
payload = {
"image": "https://replicate.delivery/pbxt/KtacIzXNav6KQBhoK4XorduzIZpxPWLjnMgayp07TPS0oS6T/blip-demo.jpg",
"question": "how many dogs are in the picture?",
"maxNewTokens": 768,
"systemPrompt": "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet:
- Replace
YOUR_COGNITIVE_ACTIONS_API_KEYwith your actual API key. - The
action_idcorresponds to the specific action you want to execute. - The
payloadis structured based on the input schema, ensuring all required fields are included.
Conclusion
The Analyze Image with BLIP3 action provides a powerful way to integrate image analysis capabilities into your applications. By leveraging this action, developers can build intelligent systems that understand and respond to visual content, enhancing user interaction significantly. Consider exploring additional use cases or combining various Cognitive Actions to create even more robust applications. Happy coding!