Enhance Your Image Generation with the fofr/sdxl-sonic-2 Cognitive Actions

21 Apr 2025
Enhance Your Image Generation with the fofr/sdxl-sonic-2 Cognitive Actions

In the world of digital content creation, the ability to generate high-quality images programmatically is invaluable. The fofr/sdxl-sonic-2 spec offers a powerful set of Cognitive Actions specifically designed for image generation, leveraging advanced techniques like inpainting and Img2Img. These pre-built actions allow developers to create customized images with various parameters, enhancing productivity and creativity in applications.

Prerequisites

Before you begin integrating the Cognitive Actions from the fofr/sdxl-sonic-2 spec, ensure you have the following:

  • An API key for the Cognitive Actions platform to authenticate your requests.
  • Basic knowledge of JSON and how to make HTTP requests in your preferred programming language.

Authentication typically involves including your API key in the request headers, allowing secure access to the Cognitive Actions services.

Cognitive Actions Overview

Generate Image with Inpainting and Img2Img

The Generate Image with Inpainting and Img2Img action is designed to create images based on input prompts while offering advanced customization options. This action utilizes inpainting and Img2Img techniques to refine outputs, enabling developers to specify dimensions, styles, and other parameters for enhanced image generation.

Input

The input schema for this action includes several fields, allowing for extensive configuration:

  • mask: (string, required) A URI that provides an input mask for inpaint mode. Black areas will be preserved while white areas will be inpainted.
  • seed: (integer, optional) A random seed for generation. Leaving this blank will use a randomized seed.
  • image: (string, optional) A URI of the input image used for Img2Img or inpaint mode.
  • width: (integer, optional) Specifies the output image width in pixels. Defaults to 1024.
  • height: (integer, optional) Specifies the output image height in pixels. Defaults to 1024.
  • prompt: (string, required) A prompt describing the desired content of the image.
  • refine: (string, optional) Determines the refinement method (options include no_refiner, expert_ensemble_refiner, base_image_refiner). Default is no_refiner.
  • loraRatio: (number, optional) Adjusts the LoRA scale for blending (range: 0 to 1). Default is 0.6.
  • scheduler: (string, optional) Selects the sampling scheduler method (e.g., DDIM, K_EULER). Default is DDIM.
  • guidanceScale: (number, optional) Scale for classifier-free guidance (range: 1 to 50). Default is 7.5.
  • applyWatermark: (boolean, optional) Applies a watermark to the generated image (default is true).
  • negativePrompt: (string, optional) A prompt specifying elements to avoid in the generated image.
  • promptStrength: (number, optional) Influence of the prompt in Img2Img/inpaint modes (range: 0 to 1). Default is 0.8.
  • numberOfOutputs: (integer, optional) Number of images to output (default is 1).
  • refinementSteps: (integer, optional) Number of refinement steps for the base image refiner.
  • highNoiseFraction: (number, optional) Fraction of noise for the expert ensemble refiner (range: 0 to 1). Default is 0.8.
  • numberOfInferenceSteps: (integer, optional) Total number of denoising inference steps (default is 50).

Example Input:

{
  "width": 1536,
  "height": 768,
  "prompt": "A screenshot in the style of TOK, pixel art, 2d platform game, sharp, snowy mountain scene",
  "refine": "expert_ensemble_refiner",
  "loraRatio": 0.6,
  "scheduler": "K_EULER",
  "guidanceScale": 7.5,
  "applyWatermark": false,
  "negativePrompt": "soft, blurry",
  "promptStrength": 0.8,
  "numberOfOutputs": 4,
  "highNoiseFraction": 0.95,
  "numberOfInferenceSteps": 50
}

Output

The action typically returns an array of URLs pointing to the generated images. Here’s what a sample output might look like:

Example Output:

[
  "https://assets.cognitiveactions.com/invocations/572d5e16-0b5b-4b3b-a563-142c60f087b7/b8069015-d7cb-4a1a-bcc8-d50bd5e2bafe.png",
  "https://assets.cognitiveactions.com/invocations/572d5e16-0b5b-4b3b-a563-142c60f087b7/f296b6a7-1413-4111-8049-9812826cc7a1.png",
  "https://assets.cognitiveactions.com/invocations/572d5e16-0b5b-4b3b-a563-142c60f087b7/7dab2eb7-a887-4236-8d23-a40f1887bbb6.png",
  "https://assets.cognitiveactions.com/invocations/572d5e16-0b5b-4b3b-a563-142c60f087b7/ca518160-a5c0-466c-ad2c-6cbabfd66f9f.png"
]

Conceptual Usage Example (Python)

Below is a conceptual Python code snippet showing how a developer might invoke this action using a hypothetical endpoint.

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "6d9672b4-52ab-4936-986c-fe317cb3159f" # Action ID for Generate Image with Inpainting and Img2Img

# Construct the input payload based on the action's requirements
payload = {
    "width": 1536,
    "height": 768,
    "prompt": "A screenshot in the style of TOK, pixel art, 2d platform game, sharp, snowy mountain scene",
    "refine": "expert_ensemble_refiner",
    "loraRatio": 0.6,
    "scheduler": "K_EULER",
    "guidanceScale": 7.5,
    "applyWatermark": False,
    "negativePrompt": "soft, blurry",
    "promptStrength": 0.8,
    "numberOfOutputs": 4,
    "highNoiseFraction": 0.95,
    "numberOfInferenceSteps": 50
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this snippet, replace the COGNITIVE_ACTIONS_API_KEY and COGNITIVE_ACTIONS_EXECUTE_URL with your actual API key and endpoint. The action ID for image generation is included, along with the structured input payload.

Conclusion

The Cognitive Actions offered through the fofr/sdxl-sonic-2 spec provide developers with a robust toolkit for generating images tailored to their needs. By leveraging advanced techniques like inpainting and Img2Img, you can enhance your applications with high-quality visual content. As you explore these capabilities, consider how you can integrate them into your projects for innovative and engaging user experiences. Happy coding!