Evaluate Text-To-Image Quality with Cognitive Actions from Flash Eval

Integrating advanced image analysis capabilities into your application has never been easier, thanks to the Cognitive Actions provided by the andreasjansson/flash-eval API. One of the standout features of this API is the ability to assess the quality of text-to-image models. This is accomplished through a powerful action that compares generated images against their input prompts using a suite of models, including CLIP, BLIP, Aesthetic, ImageReward, and PickScore. By leveraging these pre-built actions, developers can enhance their applications' performance in evaluating and improving image generation tasks.
Prerequisites
Before diving into the integration of Cognitive Actions, ensure you have the following:
- Access to the Cognitive Actions platform, including an API key for authentication.
- Basic knowledge of making API requests and handling JSON data in your development environment.
For authentication, you will typically pass your API key in the headers of your requests. This is essential for accessing the Cognitive Actions API securely.
Cognitive Actions Overview
Evaluate Text-To-Image Quality
This action assesses the quality of text-to-image models by comparing generated images with their respective input prompts. It utilizes various models to provide a comprehensive analysis of the image quality.
Category: Image Analysis
Input
The input for this action requires the following fields:
- promptsAndImages (required): A newline-separated list of prompt/image URL pairs, formatted as
<prompt>:<image1>[,<image2>[,<image3>[,...]]]. - models (optional): A comma-separated list of models used for evaluation. The default is
ImageReward,Aesthetic,CLIP,BLIP,PickScore. - imageSeparator (optional): The character used to separate image URLs in the input. The default is a comma (
,). - promptImagesSeparator (optional): The character used to separate a prompt from its associated list of images. The default is a colon (
:).
Example Input:
{
"models": "ImageReward,Aesthetic,CLIP,BLIP,PickScore",
"promptsAndImages": "a cat: https://replicate.delivery/czjl/49VkOf3fu7sfgon0W2OMf5dlIxiwUpsBAicD1lveJP0e2cL5E/output.webp,https://replicate.delivery/yhqm/M3MzBpeWPfg0qkNZcK1x4dMXr2boczHOqTHsxnEjtauJAvkTA/out-0.png\na dog: https://replicate.delivery/czjl/c2e693uAGuWIe0o6ZOqFF3R5uJt5hq7SPSsrkgfA6dz1JeSOB/output.webp,https://replicate.delivery/yhqm/yzKdMfHFMeq14kceu334KavdzLMcETTHiC12E2iMkfNgS8SOB/out-0.png"
}
Output
The action returns an array of results, each containing the prompt and the scores for the corresponding images evaluated by the selected models.
Example Output:
[
{
"prompt": "a cat",
"scores": {
"https://replicate.delivery/yhqm/M3MzBpeWPfg0qkNZcK1x4dMXr2boczHOqTHsxnEjtauJAvkTA/out-0.png": {
"BLIP": 0.4326,
"CLIP": 0.2218,
"Aesthetic": 6.0998,
"PickScore": 21.3488,
"ImageReward": 0.3194
},
"https://replicate.delivery/czjl/49VkOf3fu7sfgon0W2OMf5dlIxiwUpsBAicD1lveJP0e2cL5E/output.webp": {
"BLIP": 0.3541,
"CLIP": 0.2458,
"Aesthetic": 5.9691,
"PickScore": 21.5813,
"ImageReward": 0.6373
}
}
},
{
"prompt": "a dog",
"scores": {
"https://replicate.delivery/yhqm/yzKdMfHFMeq14kceu334KavdzLMcETTHiC12E2iMkfNgS8SOB/out-0.png": {
"BLIP": 0.3557,
"CLIP": 0.1819,
"Aesthetic": 5.1370,
"PickScore": 19.5750,
"ImageReward": -0.9157
},
"https://replicate.delivery/czjl/c2e693uAGuWIe0o6ZOqFF3R5uJt5hq7SPSsrkgfA6dz1JeSOB/output.webp": {
"BLIP": 0.3997,
"CLIP": 0.2404,
"Aesthetic": 6.5466,
"PickScore": 21.4142,
"ImageReward": 1.6095
}
}
}
]
Conceptual Usage Example (Python)
Here's how you can integrate the Evaluate Text-To-Image Quality action into your application using a conceptual Python code snippet:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "ae507177-e62a-4488-a5b7-623dfc629c2e" # Action ID for Evaluate Text-To-Image Quality
# Construct the input payload based on the action's requirements
payload = {
"models": "ImageReward,Aesthetic,CLIP,BLIP,PickScore",
"promptsAndImages": "a cat: https://replicate.delivery/czjl/49VkOf3fu7sfgon0W2OMf5dlIxiwUpsBAicD1lveJP0e2cL5E/output.webp,https://replicate.delivery/yhqm/M3MzBpeWPfg0qkNZcK1x4dMXr2boczHOqTHsxnEjtauJAvkTA/out-0.png\na dog: https://replicate.delivery/czjl/c2e693uAGuWIe0o6ZOqFF3R5uJt5hq7SPSsrkgfA6dz1JeSOB/output.webp,https://replicate.delivery/yhqm/yzKdMfHFMeq14kceu334KavdzLMcETTHiC12E2iMkfNgS8SOB/out-0.png"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet, make sure to replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The action_id should correspond to the specific action you want to execute. The input payload is structured according to the requirements, ensuring that prompts and images are formatted correctly.
Conclusion
The Evaluate Text-To-Image Quality action from the andreasjansson/flash-eval API provides a robust solution for developers looking to assess the performance of text-to-image models. By integrating this action into your applications, you can enhance image generation quality and user satisfaction. Consider exploring additional use cases or combining this action with other Cognitive Actions to unlock even more capabilities for your projects!