Generate Stunning Images with Multi-ControlNet Cognitive Actions

21 Apr 2025
Generate Stunning Images with Multi-ControlNet Cognitive Actions

In today’s digital landscape, the ability to generate high-quality images programmatically can vastly enhance applications in various domains, from gaming to marketing. The Multi-ControlNet with IP Adapter Vision v2 provides developers with a powerful API to create images based on text prompts, masks, and control images. This blog post will delve into the capabilities of the Generate Image with Multi-ControlNet action, guiding you through its use and integration into your applications.

Prerequisites

Before getting started, ensure you have:

  • An API key for the Cognitive Actions platform.
  • Basic knowledge of JSON and RESTful APIs.
  • Familiarity with making HTTP requests in Python.

To authenticate your requests, you'll typically pass your API key in the request headers. Conceptually, this would look like adding an Authorization header with your API key.

Cognitive Actions Overview

Generate Image with Multi-ControlNet

The Generate Image with Multi-ControlNet action leverages the Multi-ControlNet framework to generate images based on detailed text prompts, masks, and various control images. This action supports customization options that allow you to control noise levels, image dimensions, and the conditioning scales for multiple control nets, producing images with resolutions up to 512x512 pixels.

Input

The input for this action is defined by a JSON schema. Here are the essential fields:

  • prompt (required): A string to guide image generation.
  • eta: Controls noise levels during the process (default: 0).
  • seed: Optional integer for deterministic results.
  • maxWidth & maxHeight: Maximum dimensions for the generated image (default: 512).
  • guidanceScale: Adjusts the influence of the prompt (default: 7).
  • numOutputs: Specifies the number of images to generate (default: 1, max: 10).
  • negativePrompt: Specifies unwanted features in the generated image.

Here’s an example input JSON for the action:

{
  "eta": 0,
  "prompt": "jungle",
  "maxWidth": 512,
  "guessMode": false,
  "maxHeight": 512,
  "scheduler": "DDIM",
  "numOutputs": 1,
  "guidanceScale": 7,
  "ipAdapterCkpt": "ip-adapter_sd15.bin",
  "negativePrompt": "Longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
  "img2imgStrength": 0.5,
  "ipAdapterWeight": 1,
  "numInferenceSteps": 20,
  "sortedControlnets": "tile, inpainting, lineart",
  "disableSafetyCheck": false,
  "filmGrainLoraWeight": 0,
  "tileConditioningScale": 1,
  "addMoreDetailLoraScale": 0.5,
  "detailTweakerLoraWeight": 0,
  "epiNoiseOffsetLoraWeight": 0,
  "lineartConditioningScale": 1,
  "scribbleConditioningScale": 1,
  "brightnessConditioningScale": 1,
  "inpaintingConditioningScale": 1,
  "colorTemperatureSliderLoraWeight": 0
}

Output

When the action is executed successfully, it returns an array of URLs pointing to the generated images. Here’s a sample output:

[
  "https://assets.cognitiveactions.com/invocations/cbd4ca3a-ddbe-4635-a887-e4f4899dd459/d1a9a59b-9a17-4e45-816d-71ba227a579f.png"
]

Conceptual Usage Example (Python)

Here’s how you might call the Generate Image with Multi-ControlNet action using Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "88005396-92a7-4ff6-b654-5b51d09ba069" # Action ID for Generate Image with Multi-ControlNet

# Construct the input payload based on the action's requirements
payload = {
    "eta": 0,
    "prompt": "jungle",
    "maxWidth": 512,
    "guessMode": False,
    "maxHeight": 512,
    "scheduler": "DDIM",
    "numOutputs": 1,
    "guidanceScale": 7,
    "ipAdapterCkpt": "ip-adapter_sd15.bin",
    "negativePrompt": "Longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
    "img2imgStrength": 0.5,
    "ipAdapterWeight": 1,
    "numInferenceSteps": 20,
    "sortedControlnets": "tile, inpainting, lineart",
    "disableSafetyCheck": False,
    "filmGrainLoraWeight": 0,
    "tileConditioningScale": 1,
    "addMoreDetailLoraScale": 0.5,
    "detailTweakerLoraWeight": 0,
    "epiNoiseOffsetLoraWeight": 0,
    "lineartConditioningScale": 1,
    "scribbleConditioningScale": 1,
    "brightnessConditioningScale": 1,
    "inpaintingConditioningScale": 1,
    "colorTemperatureSliderLoraWeight": 0
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet, we construct a JSON payload using the required parameters and make a POST request to the Cognitive Actions API. Ensure you replace the API key and endpoint with actual values.

Conclusion

The Generate Image with Multi-ControlNet action opens up a world of possibilities for developers looking to create dynamic and visually appealing content. By utilizing various customization options, you can guide the image generation process to align with your application's specific needs. As a next step, consider experimenting with different prompts and settings to see how they affect the generated images, or integrate this action into your application to enhance user engagement. Happy coding!