Unlocking Image Generation with DPO-SDXL ControlNet LoRA Actions

23 Apr 2025
Unlocking Image Generation with DPO-SDXL ControlNet LoRA Actions

The batouresearch/dpo-sdxl-controlnet-lora spec offers powerful Cognitive Actions designed for image generation using advanced techniques like DPO-SDXL Canny ControlNet with Low-Rank Adaptation (LoRA) support. These pre-built actions simplify the process of generating, refining, and customizing images based on textual prompts, making them accessible for developers looking to enhance their applications with state-of-the-art image capabilities.

Prerequisites

Before integrating the Cognitive Actions into your applications, you will need to ensure:

  • An API key for the Cognitive Actions platform to authenticate your requests.
  • Basic knowledge of JSON structures, as input and output will be in JSON format.

Authentication typically involves passing your API key in the headers of your HTTP requests, allowing you to securely access the service.

Cognitive Actions Overview

Generate Image with DPO-SDXL Canny ControlNet

The Generate Image with DPO-SDXL Canny ControlNet action allows you to produce images based on descriptive prompts, with options for image refinement, guidance scaling, and watermarking. This action supports both image-to-image (img2img) and inpainting modes, providing flexibility in how you create visual content.

  • Category: Image Generation

Input

The input for this action follows the CompositeRequest schema, which includes the following fields:

  • seed (integer, optional): Random seed for generation. Leave blank to randomize.
  • image (string, required): URL of the input image for img2img or inpainting.
  • prompt (string, required): Text prompt describing the desired image content.
  • refinementMode (string, optional): Choose between no_refiner and base_image_refiner.
  • numberOfOutputs (integer, optional): Specify the number of images to generate (1 to 4).
  • refinementSteps (integer, optional): Number of refinement steps (default 10).
  • guidanceIntensity (number, optional): Scale for classifier-free guidance (1 to 50).
  • loraAdditiveScale (number, optional): LoRA additive scale (0 to 1).
  • isWatermarkApplied (boolean, optional): Whether to apply a watermark (default true).
  • schedulingStrategy (string, optional): Strategy for scheduling steps (default K_EULER).
  • negativeInputPrompt (string, optional): Aspects to exclude from the generated image.
  • numberOfDenoisingSteps (integer, optional): Number of denoising steps (1 to 500).
  • loraWeightConfigurations (string, optional): URL for LoRA weights file.
  • controlNetInterferenceScale (number, optional): ControlNet interference level (0 to 1).

Example Input:

{
  "image": "https://replicate.delivery/pbxt/K589OVTpTQjio99XGHopletMbpSrpgXDvT8VJahdVeHOAOgk/4904b1be-61dc-4ef0-916b-2f33b2ca953a.webp",
  "prompt": "shot in the style of sksfer, a woman wearing an organic shaped hat in alaska",
  "refinementMode": "no_refiner",
  "numberOfOutputs": 1,
  "refinementSteps": 10,
  "guidanceIntensity": 7.5,
  "loraAdditiveScale": 0.95,
  "isWatermarkApplied": true,
  "schedulingStrategy": "K_EULER",
  "negativeInputPrompt": "",
  "numberOfDenoisingSteps": 50,
  "loraWeightConfigurations": "https://pbxt.replicate.delivery/mwN3AFyYZyouOB03Uhw8ubKW9rpqMgdtL9zYV9GF2WGDiwbE/trained_model.tar",
  "controlNetInterferenceScale": 0.5
}

Output

The action typically returns a list of generated image URLs.

Example Output:

[
  "https://assets.cognitiveactions.com/invocations/7b235f43-7e79-4f82-9ddd-cc99aaf08bfd/c1b8e5b8-3708-4a4f-a53f-d7070cd3ec15.png"
]

Conceptual Usage Example (Python)

Here’s how you might call the Generate Image with DPO-SDXL Canny ControlNet action using Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "019116f0-98b4-46a1-9c17-641bca48112f"  # Action ID for Generate Image with DPO-SDXL Canny ControlNet

# Construct the input payload based on the action's requirements
payload = {
    "image": "https://replicate.delivery/pbxt/K589OVTpTQjio99XGHopletMbpSrpgXDvT8VJahdVeHOAOgk/4904b1be-61dc-4ef0-916b-2f33b2ca953a.webp",
    "prompt": "shot in the style of sksfer, a woman wearing an organic shaped hat in alaska",
    "refinementMode": "no_refiner",
    "numberOfOutputs": 1,
    "refinementSteps": 10,
    "guidanceIntensity": 7.5,
    "loraAdditiveScale": 0.95,
    "isWatermarkApplied": True,
    "schedulingStrategy": "K_EULER",
    "negativeInputPrompt": "",
    "numberOfDenoisingSteps": 50,
    "loraWeightConfigurations": "https://pbxt.replicate.delivery/mwN3AFyYZyouOB03Uhw8ubKW9rpqMgdtL9zYV9GF2WGDiwbE/trained_model.tar",
    "controlNetInterferenceScale": 0.5
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this example, replace "YOUR_COGNITIVE_ACTIONS_API_KEY" with your actual API key. The payload is constructed directly from the action's requirements, ensuring that all necessary fields are included. This snippet demonstrates how to send a request and handle the response, allowing you to integrate image generation capabilities into your application effectively.

Conclusion

The Cognitive Actions provided by the batouresearch/dpo-sdxl-controlnet-lora spec empower developers to easily generate and refine images based on textual descriptions. By leveraging these actions, you can enhance your applications with rich visual content tailored to user prompts. As you explore further, consider how integrating these capabilities can elevate user experiences and foster creative applications. Happy coding!