Generate Stunning Images with Kandinsky 3.0 Cognitive Actions

In the realm of artificial intelligence and creative generation, Kandinsky 3.0 stands out as a powerful model designed to transform text prompts into captivating images (Text2Img) or modify existing images (Img2Img). By leveraging this innovative model, developers can automate creative processes and enhance their applications with unique visual content. This blog post will delve into the Cognitive Actions associated with Kandinsky 3.0, guiding you through their capabilities, input requirements, and output expectations.
Prerequisites
Before you can start using the Kandinsky 3.0 Cognitive Actions, you'll need to ensure you have the following:
- An API key for the Cognitive Actions platform, which is essential for authentication.
- Familiarity with making HTTP requests, as you'll be interacting with a RESTful endpoint to execute the actions.
To authenticate your requests, you'll typically pass the API key in the request headers.
Cognitive Actions Overview
Generate Image with Kandinsky 3.0
The Generate Image with Kandinsky 3.0 action enables you to create images from descriptive text prompts or modify existing images. This action is flexible, offering features such as seed control for reproducibility, adjustable image dimensions, and prompt strength settings.
Input
The input schema for this action is structured as follows:
{
"seed": 12345,
"image": "https://example.com/input-image.jpg",
"width": 1024,
"height": 1024,
"prompt": "A beautiful sunset over a mountain range",
"strength": 0.75,
"negativePrompt": "lowres, text, error",
"numberOfInferenceSteps": 50
}
- seed (optional, integer): Random seed for reproducibility. Leave blank for a randomized seed.
- image (optional, string): Input image for img2img mode (useful for modifying existing images).
- width (integer): Width of the output image in pixels (default: 1024, max: 2048).
- height (integer): Height of the output image in pixels (default: 1024, max: 2048).
- prompt (string): Descriptive input prompt for image generation.
- strength (number): Influence of the prompt on the generated image (range: 0 to 1, default: 0.75).
- negativePrompt (string): Terms to exclude from the output image to filter out unwanted attributes.
- numberOfInferenceSteps (integer): Steps used for image denoising (range: 1 to 500, default: 50).
Example Input
Here’s a practical example of the JSON payload needed to invoke the action:
{
"width": 1024,
"height": 1024,
"prompt": "Car, mustang, movie, person, poster, car cover, person, in the style of alessandro gottardo, gold and cyan, gerald harvey jones, reflections, highly detailed illustrations, industrial urban scenes",
"strength": 0.75,
"negativePrompt": "lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, username, watermark, signature",
"numberOfInferenceSteps": 50
}
Output
Upon successful execution, the action typically returns a URL to the generated image, for example:
https://assets.cognitiveactions.com/invocations/6f4dbd27-0818-4545-9465-38dcc6f42c87/7777cbe8-1162-4836-95c5-d0214ba17218.png
Conceptual Usage Example (Python)
Here’s how you might invoke the Generate Image with Kandinsky 3.0 action in Python:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "4631a9c4-4f34-48d3-b4f9-409c59e1a84a" # Action ID for Generate Image with Kandinsky 3.0
# Construct the input payload based on the action's requirements
payload = {
"width": 1024,
"height": 1024,
"prompt": "A beautiful sunset over a mountain range",
"strength": 0.75,
"negativePrompt": "lowres, text, error",
"numberOfInferenceSteps": 50
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this example, replace the YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The action ID is specified for the Generate Image with Kandinsky 3.0 action, and the payload is constructed based on the required input fields.
Conclusion
The Kandinsky 3.0 Cognitive Actions offer developers a robust toolkit for generating and transforming images through intuitive text prompts. By understanding how to structure your requests and utilizing these actions, you can enhance your applications with unique and engaging visual content. Explore further use cases such as personalized art generation, marketing materials, and creative storytelling to fully leverage the capabilities of Kandinsky 3.0 in your projects!