Create Stunning Images with the fofr/sdxl-multi-controlnet-lora Cognitive Actions

22 Apr 2025
Create Stunning Images with the fofr/sdxl-multi-controlnet-lora Cognitive Actions

The fofr/sdxl-multi-controlnet-lora spec offers powerful Cognitive Actions designed to enhance image generation through advanced techniques. By utilizing multi-controlnet mechanisms and LoRA (Low-Rank Adaptation), developers can create intricate image compositions with refined control over the generation process. These pre-built actions simplify the integration of complex image processing capabilities in applications, saving time and effort while delivering stunning visual results.

Prerequisites

To use the Cognitive Actions for the fofr/sdxl-multi-controlnet-lora spec, you will need:

  • An API key for the Cognitive Actions platform, which will authenticate your requests.
  • A basic understanding of how to send HTTP requests and handle JSON payloads.

Authentication generally involves passing your API key in the request headers, ensuring that your calls to the service are secure and authorized.

Cognitive Actions Overview

Generate and Refine Complex Image Compositions

This action utilizes multi-controlnet mechanisms, LoRA loading, and advanced techniques like img2img and inpainting to create intricate image compositions. Enhanced capabilities include simultaneous management of up to three controlnets, conditional strength settings, and detailed image refinements. It offers refined control over image generation, including adjustable conditioning start and end points, safety checker disabling, and custom size adjustments based on various image inputs.

Category: image-processing

Input: The action accepts various parameters that define the image generation process. Here’s a breakdown of the input schema:

  • seed: (optional) Random seed for image generation. Leave blank for automatic randomization.
  • width: (required) Width of the generated output image in pixels. Default is 768.
  • height: (required) Height of the generated output image in pixels. Default is 768.
  • prompt: (required) Descriptive text to guide image generation.
  • inputMask: (optional) URI for the input mask used in inpaint mode. Black areas are preserved; white areas are inpainted.
  • loraScale: (optional) Scale for LoRA models. Must be between 0 (no influence) and 1 (full influence).
  • inputImage: (optional) URI of the input image used for img2img or inpaint modes.
  • numOutputs: (optional) The number of images to generate. Must be between 1 and 4. Default is 1.
  • refineSteps: (optional) Number of refinement steps in base_image_refiner mode.
  • guidanceScale: (optional) Scale factor for classifier-free guidance. Must be between 1 and 50.
  • applyWatermark: (optional) Toggle to apply a watermark on generated images.
  • negativePrompt: (optional) Elements that should be avoided in the generated image.
  • promptStrength: (optional) Strength of the prompt when using img2img or inpaint modes.
  • resizeStrategy: (optional) Strategy for resizing images.
  • schedulingMethod: (optional) Method for scheduling during inference.
  • numInferenceSteps: (optional) Number of steps for denoising during inference.
  • primaryControlnet: (optional) Defines the primary ControlNet module to apply during inference.
  • secondaryControlnet: (optional) Defines the secondary ControlNet module to apply during inference.
  • tertiaryControlnet: (optional) Defines the tertiary ControlNet module to apply during inference.
  • ... (additional fields related to ControlNet parameters)

Here’s an example input JSON payload for this action:

{
  "width": 768,
  "height": 768,
  "prompt": "A TOK photo, extreme macro photo of a golden astronaut riding a unicorn statue, in a museum, bokeh, 50mm",
  "loraScale": 0.8,
  "numOutputs": 1,
  "refineStyle": "no_refiner",
  "guidanceScale": 7.5,
  "applyWatermark": false,
  "negativePrompt": "rainbow",
  "promptStrength": 0.8,
  "resizeStrategy": "width_height",
  "schedulingMethod": "K_EULER",
  "numInferenceSteps": 30,
  "primaryControlnet": "soft_edge_hed",
  "secondaryControlnet": "none",
  "tertiaryControlnet": "none"
}

Output: Upon successful execution, the action will return a list of generated image URLs. For example:

[
  "https://assets.cognitiveactions.com/invocations/c0e844f3-6d8f-4064-837b-54aa09b2bbad/3a4c6a0a-03aa-4b4a-b891-4308f4b42a87.png",
  "https://assets.cognitiveactions.com/invocations/c0e844f3-6d8f-4064-837b-54aa09b2bbad/f9f43bdd-408a-40d2-b5de-9dc1b6286e51.png"
]

Conceptual Usage Example (Python):

Here’s how a developer might call the Cognitive Actions execution endpoint for this action:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "be672d5a-2471-4f0d-af86-466ea2c8e31b"  # Action ID for Generate and Refine Complex Image Compositions

# Construct the input payload based on the action's requirements
payload = {
    "width": 768,
    "height": 768,
    "prompt": "A TOK photo, extreme macro photo of a golden astronaut riding a unicorn statue, in a museum, bokeh, 50mm",
    "loraScale": 0.8,
    "numOutputs": 1,
    "refineStyle": "no_refiner",
    "guidanceScale": 7.5,
    "applyWatermark": False,
    "negativePrompt": "rainbow",
    "promptStrength": 0.8,
    "resizeStrategy": "width_height",
    "schedulingMethod": "K_EULER",
    "numInferenceSteps": 30,
    "primaryControlnet": "soft_edge_hed",
    "secondaryControlnet": "none",
    "tertiaryControlnet": "none"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet, you'll notice how the action ID and the structured input payload are incorporated. The endpoint URL and request structure are illustrative and should be adapted to your specific use case.

Conclusion

The Generate and Refine Complex Image Compositions action within the fofr/sdxl-multi-controlnet-lora spec provides developers with robust tools to create sophisticated visual content effortlessly. By leveraging the capabilities of multi-controlnet mechanisms and LoRA, you can refine your image generation processes and produce high-quality outputs tailored to your needs.

Explore the possibilities this action opens up for your applications, and consider integrating it into your next project for enhanced image processing!