Create Stunning Images with Google Imagen 3 Cognitive Actions

In the realm of image generation, Google’s Imagen 3 stands out as a powerful tool that allows developers to create visually striking and detailed images from textual descriptions. This blog post will explore the capabilities of the Cognitive Actions provided by the Imagen 3 spec, specifically focusing on the action for generating high-quality images. With these pre-built actions, developers can easily integrate advanced image generation into their applications, enhancing user experience or automating creative tasks.
Prerequisites
To begin using the Cognitive Actions associated with Google Imagen 3, there are a few prerequisites to keep in mind:
- API Key: You will need an API key for authentication. This key should be included in the headers of your requests to ensure secure access to the Cognitive Actions platform.
- Setup: Familiarize yourself with the API structure and ensure that you have the necessary libraries for making HTTP requests, such as
requestsin Python.
Typically, authentication is handled by including your API key in the request headers, allowing you to securely call the image generation actions.
Cognitive Actions Overview
Generate High-Quality Image
The Generate High-Quality Image action leverages Google’s advanced Imagen 3 model to create detailed and visually appealing images based on text prompts. This action is particularly well-suited for applications requiring high-resolution images with artistic style reproduction.
Input
The input for this action is structured as follows:
- Required Fields:
- prompt: A string that describes the image you want to generate.
- Optional Fields:
- aspectRatio: Specifies the aspect ratio of the generated image (default is 1:1). Possible values include
1:1,9:16,16:9,3:4, and4:3. - negativePrompt: A string that specifies elements or themes to avoid in the generated images.
- safetyFilterLevel: Determines the level of content filtering, with options such as
block_low_and_above(strictest),block_medium_and_above(default), andblock_only_high(most permissive).
- aspectRatio: Specifies the aspect ratio of the generated image (default is 1:1). Possible values include
Here’s an example of the JSON payload you would use to invoke this action:
{
"prompt": "A close-up, macro photography stock photo of a strawberry intricately sculpted into the shape of a hummingbird in mid-flight, its wings a blur as it sips nectar from a vibrant, tubular flower. The backdrop features a lush, colorful garden with a soft, bokeh effect, creating a dreamlike atmosphere. The image is exceptionally detailed and captured with a shallow depth of field, ensuring a razor-sharp focus on the strawberry-hummingbird and gentle fading of the background. The high resolution, professional photographers style, and soft lighting illuminate the scene in a very detailed manner, professional color grading amplifies the vibrant colors and creates an image with exceptional clarity. The depth of field makes the hummingbird and flower stand out starkly against the bokeh background.",
"aspectRatio": "1:1",
"safetyFilterLevel": "block_medium_and_above"
}
Output
The action typically returns a URL pointing to the generated image. For example:
https://assets.cognitiveactions.com/invocations/c5d50dd9-8e2c-448a-ba52-faf9b59aa35b/28da348e-ae1a-41a8-a384-38e004064628.png
This URL can be used to access the high-quality image generated based on the provided prompt.
Conceptual Usage Example (Python)
Here’s how you might call the Generate High-Quality Image action using Python:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "62853504-d264-47c9-8fa4-393403c433ac" # Action ID for Generate High-Quality Image
# Construct the input payload based on the action's requirements
payload = {
"prompt": "A close-up, macro photography stock photo of a strawberry intricately sculpted into the shape of a hummingbird in mid-flight, its wings a blur as it sips nectar from a vibrant, tubular flower. The backdrop features a lush, colorful garden with a soft, bokeh effect, creating a dreamlike atmosphere. The image is exceptionally detailed and captured with a shallow depth of field, ensuring a razor-sharp focus on the strawberry-hummingbird and gentle fading of the background. The high resolution, professional photographers style, and soft lighting illuminate the scene in a very detailed manner, professional color grading amplifies the vibrant colors and creates an image with exceptional clarity. The depth of field makes the hummingbird and flower stand out starkly against the bokeh background.",
"aspectRatio": "1:1",
"safetyFilterLevel": "block_medium_and_above"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet:
- The API key and endpoint URL are specified at the beginning.
- The input payload is constructed based on the action's requirements.
- A POST request is made to the hypothetical Cognitive Actions execution endpoint, passing the action ID and input payload.
- The response is checked for success, and the resulting image URL is printed.
Conclusion
The Google Imagen 3 Cognitive Actions provide developers with powerful tools for generating high-quality images from text prompts. By leveraging these actions, you can enhance your applications with visually appealing content, automate creative tasks, or develop engaging user experiences. As you explore the capabilities of image generation, consider the various ways you can integrate this technology into your projects. Happy coding!