Generate Stunning Images from Text with ai-forever/kandinsky-2.2 Cognitive Actions

21 Apr 2025
Generate Stunning Images from Text with ai-forever/kandinsky-2.2 Cognitive Actions

In the evolving landscape of AI-driven creativity, ai-forever/kandinsky-2.2 stands out as a powerful tool for developers looking to generate stunning images from textual descriptions. This set of Cognitive Actions leverages the Kandinsky 2.2 model, enabling users to transform multilingual prompts into aesthetically pleasing images utilizing advanced encoding techniques. By integrating these pre-built actions into your applications, you can unlock a world of possibilities, from creating art to enhancing user experiences.

Prerequisites

Before diving into the Cognitive Actions, make sure you have the following set up:

  • API Key: You will need an API key for the Cognitive Actions platform to authenticate your requests.
  • Endpoint Access: Ensure you have access to the necessary endpoints to invoke these actions.

Authentication typically involves passing your API key in the request headers, allowing you to securely access the services.

Cognitive Actions Overview

Generate Multilingual Text-to-Image

The Generate Multilingual Text-to-Image action allows you to create images based on textual prompts in various languages. It utilizes the Kandinsky 2.2 model, which is enhanced with CLIP-ViT-G image encoding and ControlNet support. This action is categorized under image generation.

Input

The input for this action must be structured as follows:

  • seed (optional): An integer for the random seed. If not specified, a random seed will be generated.
  • width (optional): Specifies the width of the output image in pixels. Options include 384, 512, 576, 640, 704, 768, 960, 1024, 1152, 1280, 1536, 1792, or 2048. Default is 512.
  • height (optional): Specifies the height of the output image in pixels, with the same options as width. Default is 512.
  • prompt (required): A textual description of the desired image.
  • numberOfOutputs (optional): The number of images to generate (1-4). Default is 1.
  • outputImageFormat (optional): The format for the output images, selectable from 'webp', 'jpeg', or 'png'. Default is 'webp'.
  • negativeInputPrompt (optional): A description of elements to exclude from the image.
  • numberOfInferenceSteps (optional): The number of denoising steps during generation (1-500). Default is 75.
  • numberOfPriorInferenceSteps (optional): The number of denoising steps for prior processes (1-500). Default is 25.

Here’s an example of the JSON payload needed to invoke this action:

{
  "width": 1024,
  "height": 1024,
  "prompt": "A moss covered astronaut with a black background",
  "numberOfOutputs": 1,
  "numberOfInferenceSteps": 75
}

Output

Upon execution, this action typically returns a list of URLs pointing to the generated images. For example:

[
  "https://assets.cognitiveactions.com/invocations/0f4ef27b-1231-4840-a8dc-f9b56daac302/d7cecb27-acfa-4c9c-b54d-9368a17d89f5.webp"
]

The output will vary based on the number of images requested and the generation process.

Conceptual Usage Example (Python)

Here’s how you might structure a call to the Generate Multilingual Text-to-Image action using Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "c2b1ec61-fdf7-45c7-a122-13393d29d435"  # Action ID for Generate Multilingual Text-to-Image

# Construct the input payload based on the action's requirements
payload = {
    "width": 1024,
    "height": 1024,
    "prompt": "A moss covered astronaut with a black background",
    "numberOfOutputs": 1,
    "numberOfInferenceSteps": 75
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this example, the action ID and input payload are defined, highlighting how to make a request to the Cognitive Actions endpoint. The endpoint URL and request structure are illustrative and should be tailored to your specific implementation.

Conclusion

The ai-forever/kandinsky-2.2 Cognitive Actions provide developers with a powerful means to generate images from text, enhancing creativity and user engagement in applications. With options for various input parameters and output formats, the potential use cases are vast—from art generation to marketing visuals. Start exploring these Cognitive Actions today and unlock the creative potential of AI in your applications!