Create Stunning Images with Kandinsky 2.1 Cognitive Actions

The Kandinsky 2.1 Cognitive Actions provide developers with the power to generate high-quality images using advanced techniques from diffusion models. By leveraging state-of-the-art methods from Dall-E 2 and the latent diffusion framework, these actions allow for creative image generation and manipulation based on textual descriptions. The ability to utilize image and text embeddings from the CLIP model enhances the visual output's relevance and quality. Whether you want to create entirely new images or modify existing ones, these pre-built actions simplify the process, enabling developers to focus on creativity without worrying about the underlying complexities.
Prerequisites
Before diving into the integration of Kandinsky 2.1 Cognitive Actions, ensure you have the following:
- An API key for the Cognitive Actions platform.
- Basic setup for making HTTP requests in your application.
For authentication, you'll typically pass your API key in the headers of your requests, ensuring secure access to the Cognitive Actions API.
Cognitive Actions Overview
Generate Image with Kandinsky 2.1
The Generate Image with Kandinsky 2.1 action is designed to create stunning visuals based on textual prompts or manipulate existing images according to user input. This action falls under the category of image-generation.
Input
The input for this action is defined by a schema that includes several parameters:
- seed (integer): The random seed for generating outputs. If omitted, a random seed will be generated.
- task (string): The task type, either
text2img(generate an image from text) ortext_guided_img2img(modify an existing image). Default:text2img. - image (string): A URI of the input image (required for
text_guided_img2imgtask). - width (integer): Width of the output image (default: 512, range: 128 to 1024).
- height (integer): Height of the output image (default: 512, range: 128 to 1024).
- prompt (string): The text prompt describing the desired output (default: "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting").
- strength (number): Transformation strength for the input image (range: 0 to 1, applicable only for
text_guided_img2img). - guidanceScale (number): Scale for classifier-free guidance (default: 4, range: 1 to 20).
- negativePrompt (string): Elements to avoid in the output (default: "low quality, bad quality").
- numberOfOutputs (integer): Number of images to generate (default: 1, range: 1 to 4).
- numberOfStepsPrior (integer): Number of denoising steps in the prior process (default: 25, range: 1 to 500).
- numberOfInferenceSteps (integer): Number of denoising steps during inference (default: 100, range: 1 to 500).
Example Input
{
"task": "text2img",
"width": 768,
"height": 768,
"prompt": "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting",
"strength": 0.3,
"guidanceScale": 1,
"negativePrompt": "low quality, bad quality",
"numberOfOutputs": 1,
"numberOfStepsPrior": 25,
"numberOfInferenceSteps": 100
}
Output
The action typically returns a list of image URLs generated based on the specified input parameters.
Example Output
[
"https://assets.cognitiveactions.com/invocations/d9addad6-1e35-478a-8fb4-3338f97379c7/d57681ef-2902-42ab-a7a2-fccbc144a3d3.png"
]
Conceptual Usage Example (Python)
Below is a conceptual Python code snippet demonstrating how to invoke the Generate Image with Kandinsky 2.1 action:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "e19ee14b-8350-41e1-8dee-5333c4c7dea0" # Action ID for Generate Image with Kandinsky 2.1
# Construct the input payload based on the action's requirements
payload = {
"task": "text2img",
"width": 768,
"height": 768,
"prompt": "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting",
"strength": 0.3,
"guidanceScale": 1,
"negativePrompt": "low quality, bad quality",
"numberOfOutputs": 1,
"numberOfStepsPrior": 25,
"numberOfInferenceSteps": 100
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this snippet, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The action ID corresponds to the Generate Image with Kandinsky 2.1 function. The input payload is structured according to the specified example, ensuring that all required fields are included.
Conclusion
The Kandinsky 2.1 Cognitive Actions present an exciting opportunity for developers to create and manipulate images with minimal effort. By understanding the input requirements and the potential outputs, you can easily integrate these actions into your applications, enhancing user engagement with stunning visuals. As you experiment with different prompts and settings, consider exploring further use cases, such as incorporating these functionalities into creative tools, games, or marketing applications. Happy coding!