Generate Stunning Images with the Stable Diffusion Toby Cognitive Actions

24 Apr 2025
Generate Stunning Images with the Stable Diffusion Toby Cognitive Actions

In the world of AI-driven creativity, mattfrances/stable-diffusion-toby offers powerful Cognitive Actions that allow developers to harness the capabilities of the Stable Diffusion model for image generation. With pre-built actions, developers can easily create customized images through inpainting and img2img techniques, tailored to fit various styles and objectives. These actions simplify the integration of advanced image generation functionalities into applications, taking creativity to new heights.

Prerequisites

To begin using the Cognitive Actions from the Stable Diffusion Toby specification, you'll need to ensure you have the following:

  • An API key to authenticate your requests with the Cognitive Actions platform.
  • A basic understanding of how to make API calls, including how to structure JSON payloads and utilize HTTP requests.

Authentication typically involves including your API key in the request headers, ensuring secure access to the actions.

Cognitive Actions Overview

Generate Image with Inpainting

The Generate Image with Inpainting action allows you to create customized images by applying inpainting and img2img techniques, making it possible to refine images based on specific prompts.

  • Category: image-generation
  • Purpose: This action enables the generation of images that can be customized by specifying areas to preserve or change, thus allowing for personalized content creation.

Input

The action requires a structured input payload, which can include the following fields:

  • mask (string, optional): URI pointing to an input mask for inpaint mode.
  • seed (integer, optional): Random seed for image generation.
  • image (string, optional): URI for the input image.
  • width (integer, default: 1024): Width of the output image in pixels.
  • height (integer, default: 1024): Height of the output image in pixels.
  • prompt (string, default: "An astronaut riding a rainbow unicorn"): The guiding prompt for image generation.
  • loraWeights (string, optional): LoRA weights for image generation.
  • refineMethod (string, default: "no_refiner"): Method for refinement.
  • loraIntensity (number, default: 0.6): Intensity of the LoRA scale.
  • applyWatermark (boolean, default: true): Whether to apply a watermark to the output image.
  • negativePrompt (string, optional): Elements to avoid in the image.
  • promptStrength (number, default: 0.8): Strength of the prompt in img2img or inpaint modes.
  • numberOfOutputs (integer, default: 1): Number of images to generate (max 4).
  • schedulingMethod (string, default: "K_EULER"): Method for scheduling steps during generation.
  • guidanceIntensity (number, default: 7.5): Scale of guidance intensity.
  • highNoiseFraction (number, default: 0.8): Fraction of noise for refinement.
  • numberOfInferenceSteps (integer, default: 50): Number of denoising steps during generation.

Example Input:

{
  "width": 1024,
  "height": 1024,
  "prompt": "A photo of TOK sitting",
  "refineMethod": "no_refiner",
  "loraIntensity": 0.6,
  "applyWatermark": true,
  "negativePrompt": "",
  "promptStrength": 0.8,
  "numberOfOutputs": 1,
  "schedulingMethod": "K_EULER",
  "guidanceIntensity": 7.5,
  "highNoiseFraction": 0.8,
  "numberOfInferenceSteps": 50
}

Output

The action typically returns a list of image URLs generated based on the input parameters. An example output might look like this:

Example Output:

[
  "https://assets.cognitiveactions.com/invocations/cff0d92b-514d-454b-8ba3-766a281de2ac/39096310-0d9a-45dd-b64b-ca58b1e7e501.png"
]

Conceptual Usage Example (Python)

Here’s how you might structure a Python script to call this action:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "03051014-6f78-452c-8620-5c3778725aaf" # Action ID for Generate Image with Inpainting

# Construct the input payload based on the action's requirements
payload = {
    "width": 1024,
    "height": 1024,
    "prompt": "A photo of TOK sitting",
    "refineMethod": "no_refiner",
    "loraIntensity": 0.6,
    "applyWatermark": true,
    "negativePrompt": "",
    "promptStrength": 0.8,
    "numberOfOutputs": 1,
    "schedulingMethod": "K_EULER",
    "guidanceIntensity": 7.5,
    "highNoiseFraction": 0.8,
    "numberOfInferenceSteps": 50
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet, replace the placeholders with your actual API key and endpoint. The payload is structured to include all necessary parameters for the action, and the response will provide the generated image URLs.

Conclusion

The Cognitive Actions provided by the mattfrances/stable-diffusion-toby specification empower developers to seamlessly integrate advanced image generation capabilities into their applications. By leveraging the Generate Image with Inpainting action, you can create stunning, customized images that cater to specific needs and styles. As you explore these actions further, consider the various prompts and parameters available, which can lead to innovative use cases in creative applications, marketing, and beyond. Happy coding!