Generate Stunning Images with jyoung105/stable-cascade Cognitive Actions

22 Apr 2025
Generate Stunning Images with jyoung105/stable-cascade Cognitive Actions

In the realm of AI-driven creativity, generating images from text prompts has become a fascinating field. The jyoung105/stable-cascade spec offers a robust Cognitive Action that allows developers to harness the power of the Würstchen architecture for text-to-image generation. This action not only simplifies the process of creating images but also provides extensive customization options, enhancing the overall user experience. In this blog post, we will explore how to utilize the Generate Image with Würstchen action, detailing its capabilities, input requirements, and output expectations.

Prerequisites

Before diving into the integration of the Cognitive Actions, ensure you have:

  • An API key for the Cognitive Actions platform.
  • Familiarity with making HTTP requests and handling JSON data.
  • A development environment set up for executing Python code.

Authentication typically involves passing your API key in the request headers to ensure secure access to the Cognitive Actions.

Cognitive Actions Overview

Generate Image with Würstchen

Description: This action generates images from text prompts using Würstchen, an efficient architecture designed for large-scale text-to-image diffusion models. Customization options include image dimensions and the ability to specify the number of images, along with prior and decoder guidance scales for enhanced image quality.

Category: Image Generation

Input

The input for this action is structured as follows:

  • seed (optional): A random seed for reproducibility. If left blank, a random seed is generated.
  • width (optional): Width of the output image in pixels (default: 1024, max: 2048, min: 1).
  • height (optional): Height of the output image in pixels (default: 1024, max: 2048, min: 1).
  • prompt (required): A text prompt describing what to generate in the image.
  • numberOfImages (optional): Number of images to generate (default: 1, max: 4, min: 1).
  • priorGuidanceScale (optional): Guidance scale for classifier-free guidance in the prior (default: 4, max: 20, min: 0).
  • negativeInputPrompt (optional): A text prompt specifying what should not be included in the generated image.
  • priorDenoisingSteps (optional): Number of denoising steps in the prior (default: 20, max: 50, min: 1).
  • decoderGuidanceScale (optional): Guidance scale for classifier-free guidance during decoding (default: 0, max: 20, min: 0).
  • decoderDenoisingSteps (optional): Number of denoising steps during decoding (default: 10, max: 50, min: 1).

Example Input:

{
  "width": 1024,
  "height": 1024,
  "prompt": "A man with hoodie on, illustration",
  "numberOfImages": 1,
  "priorGuidanceScale": 4,
  "priorDenoisingSteps": 20,
  "decoderGuidanceScale": 0,
  "decoderDenoisingSteps": 10
}

Output

Upon successful execution, the action returns a list of URLs pointing to the generated images. The output for our example would look something like this:

Example Output:

[
  "https://assets.cognitiveactions.com/invocations/e54b5c0f-0623-4dd4-957b-1690a5b4d6ab/b6459f53-063f-4e4d-9fc0-1efd2b11214a.png"
]

Conceptual Usage Example (Python)

Here’s a conceptual Python code snippet demonstrating how to call the Cognitive Actions execution endpoint for generating an image:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "b998f3c2-bb28-4ec6-88e7-8a1c59d6e5b1" # Action ID for Generate Image with Würstchen

# Construct the input payload based on the action's requirements
payload = {
    "width": 1024,
    "height": 1024,
    "prompt": "A man with hoodie on, illustration",
    "numberOfImages": 1,
    "priorGuidanceScale": 4,
    "priorDenoisingSteps": 20,
    "decoderGuidanceScale": 0,
    "decoderDenoisingSteps": 10
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this example, you replace the placeholder API key and endpoint with your actual credentials. The action ID for the Generate Image with Würstchen action is included in the request, along with the structured input payload.

Conclusion

The jyoung105/stable-cascade Cognitive Actions provide a powerful means to generate images from textual prompts, offering flexibility and customization for developers. By understanding the input structure and output expectations, you can seamlessly integrate image generation capabilities into your applications. As next steps, consider experimenting with different prompts and configurations to see the diverse range of images you can create, enhancing your projects with AI-driven visuals.