Unlocking Image Insights with Yi-VL-34B Cognitive Actions

In today's fast-paced digital landscape, understanding image content is more crucial than ever. The Yi-VL-6B spec offers a powerful Cognitive Action that leverages the Yi-VL-34B model, which is the world's first open-source 34B vision and language model (VL model). This action excels in describing and analyzing images with remarkable accuracy, as demonstrated in benchmark tests like MMMU and CMMMU. Integrating such advanced capabilities into your application can enhance user experiences and provide valuable insights.
Prerequisites
Before diving into the Cognitive Actions, ensure you have the following:
- An API key for the Cognitive Actions platform.
- Basic knowledge of handling HTTP requests.
- Familiarity with JSON for structuring input and output data.
For authentication, you will typically pass your API key in the request headers to access the Cognitive Actions.
Cognitive Actions Overview
Describe Image Using Yi-VL-34B
The Describe Image Using Yi-VL-34B action enables developers to analyze images and generate detailed descriptions, providing users with insights into the visual content. This action falls under the category of image-analysis.
Input
The action requires the following input fields based on its schema:
- imageUri (required): A string that should be a valid URL pointing to the image resource.
- queryDescription (optional): A string that describes the request for the image. This defaults to "Describe this image."
Example Input:
{
"imageUri": "https://replicate.delivery/pbxt/KKavLA9ZqUtUBRxlS8FpprmzydSzYnfTfYGbAkcTCCwMxy14/extreme_ironing.jpg",
"queryDescription": "Describe this image."
}
Output
The action returns a descriptive text of the image, effectively summarizing its content. A typical output might look like this:
Example Output:
In the heart of a bustling city, a man in a vibrant yellow shirt and a blue baseball cap is engrossed in the act of folding a shirt on the back of a yellow taxi. The taxi, adorned with a red and white sign that reads "NYC Taxi", is parked on the side of a busy street. The man, standing on the back of the taxi, is positioned next to a black SUV. The scene is set against a backdrop of towering buildings, their windows reflecting the hustle and bustle of city life. The man's focus on his task contrasts with the dynamic energy of the cityscape around him.
Conceptual Usage Example (Python)
Here’s how you might call the Describe Image action using a hypothetical Cognitive Actions execution endpoint:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "96822da7-0683-404f-b784-0dc166719780" # Action ID for Describe Image Using Yi-VL-34B
# Construct the input payload based on the action's requirements
payload = {
"imageUri": "https://replicate.delivery/pbxt/KKavLA9ZqUtUBRxlS8FpprmzydSzYnfTfYGbAkcTCCwMxy14/extreme_ironing.jpg",
"queryDescription": "Describe this image."
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet, you replace the API key and endpoint with your actual values. The action ID is set to the one corresponding to the Describe Image action. The input payload is structured to include the image URI and query description, which you send to the hypothetical execution endpoint.
Conclusion
Integrating the Describe Image Using Yi-VL-34B Cognitive Action into your application can significantly enhance its functionality by providing detailed insights into images. This capability not only enriches the user experience but also opens up a multitude of use cases in various domains such as social media, e-commerce, and accessibility. Consider exploring further applications of image analysis to leverage the power of AI in your next project!