Unleashing Image and Text Analysis with Hayooucom Vision LLaMA3 Cognitive Actions

In the world of artificial intelligence, the ability to analyze images and generate descriptive text can greatly enhance user experience and application functionality. The Hayooucom Vision LLaMA3 spec provides powerful Cognitive Actions that leverage advanced models to perform image and text analysis seamlessly. By integrating these pre-built actions into your applications, you can automate content generation, improve accessibility, and enrich user interactions with contextual insights.
Prerequisites
Before diving into the integration of Cognitive Actions, ensure you have the following:
- An API key for the Hayooucom Cognitive Actions platform.
- Basic setup of your development environment capable of making HTTP requests.
- Familiarity with JSON format for structuring your input and interpreting outputs.
Authentication generally involves passing your API key in the headers of your requests, allowing secure access to the Cognitive Actions.
Cognitive Actions Overview
Perform Image and Text Analysis
The Perform Image and Text Analysis action utilizes the Vision LLaMA3 model to analyze images and generate descriptive text based on input prompts. This action is categorized under image analysis and supports enhanced context and text decoding options.
Input
The input schema for this action requires several fields that can be customized based on your needs. Here’s a breakdown:
- seed (integer): The seed for the random number generator.
- topK (integer, default: 1): Samples from the top K most likely tokens during text decoding.
- topP (number, default: 1): Samples from the top P percentage of likely tokens during decoding.
- prompt (string, default: "hello, who are you?"): The text prompt sent to the model.
- imageUrl (array of strings, default: ): An array of public image URLs for analysis.
- maxTokens (integer, default: 45000): Maximum tokens allowed in text generation.
- imageBase64 (array of strings, default: ): Array of image data encoded in Base64 format.
- temperature (number, default: 0.1): Controls randomness in outputs.
- maxNewTokens (integer, default: 200): Maximum new tokens to generate.
- systemPrompt (string, default: "You are a helpful AI assistant."): Initial instructions for the AI model.
- repetitionPenalty (number, default: 1.1): Penalty for repeated words in generated text.
Example Input:
{
"topK": 1,
"topP": 1,
"prompt": "describe this image",
"imageUrl": [
"https://support.content.office.net/en-us/media/3dd2b79b-9160-403d-9967-af893d17b580.png"
],
"maxTokens": 45000,
"imageBase64": [],
"temperature": 0.1,
"maxNewTokens": 200,
"systemPrompt": "You are a helpful AI assistant.",
"repetitionPenalty": 1.1
}
Output
Upon successful execution, the action typically returns a descriptive text based on the analyzed image. Here’s an example of the output you might receive:
Example Output:
[
"The image is a screenshot of an Excel spreadsheet with the title \"Sales Report\". It contains data related to sales, including product names, quantities sold (Qtr 1 and Qtr 2), and total sales amounts..."
]
Conceptual Usage Example (Python)
To call this action using a hypothetical Cognitive Actions execution endpoint, you can structure your Python code as follows:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "50e715f3-9ca1-4526-b429-53334c57555c" # Action ID for Perform Image and Text Analysis
# Construct the input payload based on the action's requirements
payload = {
"topK": 1,
"topP": 1,
"prompt": "describe this image",
"imageUrl": [
"https://support.content.office.net/en-us/media/3dd2b79b-9160-403d-9967-af893d17b580.png"
],
"maxTokens": 45000,
"imageBase64": [],
"temperature": 0.1,
"maxNewTokens": 200,
"systemPrompt": "You are a helpful AI assistant.",
"repetitionPenalty": 1.1
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet:
- Replace
YOUR_COGNITIVE_ACTIONS_API_KEYwith your actual API key. - The payload is structured based on the required input fields for the action.
- The response is printed in a readable format, allowing you to see the analysis results.
Conclusion
Integrating the Hayooucom Vision LLaMA3 Cognitive Actions into your applications can significantly enhance their capabilities, enabling advanced image and text analysis. The example provided illustrates how to set up and execute the action effectively. As you explore these tools further, consider various use cases such as automating content creation, enhancing accessibility, or developing smart applications that react to visual data. Happy coding!