Enhance Your App with Image Analysis using BLIP3 Cognitive Actions

21 Apr 2025
Enhance Your App with Image Analysis using BLIP3 Cognitive Actions

Integrating advanced capabilities into your applications has never been easier with the lucataco/blip3-phi3-mini-instruct-r-v1 API. This API provides a set of powerful Cognitive Actions, specifically designed to leverage the BLIP3 series of Large Multimodal Models developed by Salesforce AI Research. Among these actions, the ability to analyze images and generate insightful responses allows developers to create applications that can understand and interact with visual content intelligently.

Prerequisites

Before you dive into using the Cognitive Actions, ensure you have the following:

  • An API key for the Cognitive Actions platform.
  • Basic knowledge of sending HTTP requests and handling JSON data.
  • Familiarity with Python (or your preferred programming language) for implementation.

Authentication typically involves sending your API key in the request headers, allowing you to access the actions securely.

Cognitive Actions Overview

Analyze Image with BLIP3

The Analyze Image with BLIP3 action allows you to analyze images and respond to questions using high-quality image captioning and in-context learning capabilities. This action falls under the category of image-analysis and is ideal for applications that require intelligent interaction with images.

Input

The input schema for this action requires the following fields:

  • image (required): A URI pointing to the image that you want to analyze.
  • question (optional): A question regarding the image. If not provided, it defaults to "how many dogs are in the picture?".
  • maxNewTokens (optional): Specifies the maximum number of tokens to generate in the response, with a default value of 768 (must be between 512 and 2047).
  • systemPrompt (optional): Sets the context for the interaction, guiding the AI on the desired style and tone. The default prompt is: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions".

Here's an example input payload:

{
  "image": "https://replicate.delivery/pbxt/KtacIzXNav6KQBhoK4XorduzIZpxPWLjnMgayp07TPS0oS6T/blip-demo.jpg",
  "question": "how many dogs are in the picture?",
  "maxNewTokens": 768,
  "systemPrompt": "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions"
}

Output

The action will typically return a text response based on the analysis of the image. For example:

There is one dog in the picture.

This output provides a direct answer to the question posed, demonstrating the capability of the model to understand and interpret visual content.

Conceptual Usage Example (Python)

Here's how you might call the Analyze Image with BLIP3 action using Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "efb7885e-7ada-4fc9-b6d7-412068899457" # Action ID for Analyze Image with BLIP3

# Construct the input payload based on the action's requirements
payload = {
    "image": "https://replicate.delivery/pbxt/KtacIzXNav6KQBhoK4XorduzIZpxPWLjnMgayp07TPS0oS6T/blip-demo.jpg",
    "question": "how many dogs are in the picture?",
    "maxNewTokens": 768,
    "systemPrompt": "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet:

  • Replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key.
  • The action_id corresponds to the specific action you want to execute.
  • The payload is structured based on the input schema, ensuring all required fields are included.

Conclusion

The Analyze Image with BLIP3 action provides a powerful way to integrate image analysis capabilities into your applications. By leveraging this action, developers can build intelligent systems that understand and respond to visual content, enhancing user interaction significantly. Consider exploring additional use cases or combining various Cognitive Actions to create even more robust applications. Happy coding!