Generate High-Detail Images with ControlNet Depth2Img Actions

22 Apr 2025
Generate High-Detail Images with ControlNet Depth2Img Actions

In the world of image processing, the ability to create visually stunning and detailed images from simple prompts is invaluable. The ControlNet Depth2Img API offers a unique set of cognitive actions that allow developers to leverage depth maps for enhanced image generation. By using these pre-built actions, you can significantly simplify the process of generating high-quality visuals that maintain intricate depth details.

Prerequisites

Before diving into the integration of these cognitive actions, ensure you have the following:

  • API Key: You will need an API key for the Cognitive Actions platform to authenticate your requests.
  • Development Environment: A setup that allows for making HTTP requests, preferably in a language like Python.

Authentication typically involves passing your API key in the headers of your HTTP requests, enabling secure access to the actions provided.

Cognitive Actions Overview

Generate High-Detail Image with Depth Map

Description: This action utilizes ControlNet to adaptively modify images using 512x512 depth maps, resulting in detailed image generation. It retains more depth information compared to standard implementations, offering enhanced detail preservation and customization.

Category: Image Processing

Input

The input for this action requires several fields, some mandatory and others optional:

{
  "image": "https://replicate.delivery/pbxt/IKFvJn5EpLuDDsFysOP4B1J9HvKDbMBCwZUK9n6p9mIPoQwG/sd.png",
  "prompt": "a stormtrooper giving a lecture at a university",
  "scale": 9,
  "steps": 20,
  "addedPrompt": "best quality, extremely detailed",
  "negativePrompt": "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
  "imageResolution": "512",
  "numberOfSamples": "1",
  "detectionResolution": 512
}
  • Required Fields:
    • image: URI of the input image.
    • prompt: Main prompt describing the desired characteristics of the generated image.
  • Optional Fields:
    • seed: Random seed for deterministic results.
    • scale: Guidance scale for prompt influence, default is 9.
    • steps: Number of diffusion steps, default is 20.
    • addedPrompt: Additional qualitative prompt details, default is "best quality, extremely detailed".
    • negativePrompt: Parameters to exclude undesirable features, default includes various negative traits.
    • imageResolution: Target resolution for the output image, options include '256', '512' (default), or '768'.
    • numberOfSamples: Number of image samples to generate, options are '1' (default) or '4'.
    • detectionResolution: Resolution for image detection, default is 512.
    • estimatedTimeArrival: ETA value used during the DDIM sampling process, default is 0.

Output

The action typically returns an array of generated image URLs. Here’s an example of what you might receive:

[
  "https://assets.cognitiveactions.com/invocations/716fe455-293c-4c20-b85c-f0c739d8f12e/32bc7b18-1a2b-4206-895a-67f52114c4d6.png",
  "https://assets.cognitiveactions.com/invocations/716fe455-293c-4c20-b85c-f0c739d8f12e/40f4a676-e4f8-416b-8dbc-8f494beea8c8.png"
]

This output provides direct links to the generated images, allowing you to easily incorporate them into your application.

Conceptual Usage Example (Python)

Here’s how you might structure a call to this action using Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "db13bd2c-9e9b-4acb-9129-17e6289fc570" # Action ID for Generate High-Detail Image with Depth Map

# Construct the input payload based on the action's requirements
payload = {
    "image": "https://replicate.delivery/pbxt/IKFvJn5EpLuDDsFysOP4B1J9HvKDbMBCwZUK9n6p9mIPoQwG/sd.png",
    "scale": 9,
    "steps": 20,
    "prompt": "a stormtrooper giving a lecture at a university",
    "addedPrompt": "best quality, extremely detailed",
    "negativePrompt": "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
    "imageResolution": "512",
    "numberOfSamples": "1",
    "detectionResolution": 512
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this snippet, replace the COGNITIVE_ACTIONS_API_KEY and COGNITIVE_ACTIONS_EXECUTE_URL with your actual API key and endpoint. The action ID and input payload are structured as required for this action, allowing you to execute it effectively.

Conclusion

The ControlNet Depth2Img actions provide a powerful way to generate high-detail images while maintaining depth integrity. By using the provided input structure and sample code, developers can easily integrate these cognitive actions into their applications. Consider experimenting with various prompts and parameters to unlock the full potential of image processing capabilities offered by this API.