Enhance Image Editing with Depth Maps Using adirik/t2i-adapter-sdxl-depth-midas Actions

24 Apr 2025
Enhance Image Editing with Depth Maps Using adirik/t2i-adapter-sdxl-depth-midas Actions

In the world of digital content creation, the ability to modify images with precision is invaluable. The adirik/t2i-adapter-sdxl-depth-midas spec empowers developers to harness the power of cognitive actions to enhance image editing capabilities. These pre-built actions allow for sophisticated modifications by incorporating depth maps, making it easier to control and enrich the original qualities of images.

Prerequisites

Before diving into the integration of the Cognitive Actions, ensure you have the following:

  • API Key: You will need an API key for the Cognitive Actions platform to authenticate your requests.
  • Setup: Familiarity with making HTTP requests in your programming environment, especially using JSON payloads.

Authentication typically involves passing your API key in the headers of your requests, ensuring the security of your application.

Cognitive Actions Overview

Modify Images with Depth Maps

The Modify Images with Depth Maps action utilizes T2I-Adapter with Stable Diffusion-XL to allow users to edit images by incorporating depth maps. This action enhances image editing by accepting additional inputs, such as human body poses, line art, and sketches, while maintaining the integrity of the original image.

Input

The input schema for this action requires the following fields:

  • image (required): The URI of the input image to be processed (must be in a valid URI format).
  • prompt (optional): A textual description guiding the image generation (default: "A photo of a room, 4k photo, highly detailed").
  • scheduler (optional): The algorithm used for scheduling (default: "K_EULER_ANCESTRAL").
  • randomSeed (optional): An integer seed for reproducibility.
  • guidanceScale (optional): A weighting parameter to influence the prompt (default: 7.5).
  • negativePrompt (optional): Keywords to avoid in the generated output (default: "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured").
  • numberOfSamples (optional): Specifies the number of images to generate (default: 1, must be between 1 and 4).
  • numberOfInferenceSteps (optional): Total steps for the diffusion process (default: 30).
  • adapterConditioningScale (optional): Scales the conditioning effect applied during inference (default: 1).
  • adapterConditioningFactor (optional): Controls the scaling factor applied to the input image (default: 1).

Example Input:

{
  "image": "https://replicate.delivery/pbxt/JbnAzlvH84NR20HgqUdfnLlMMwwiU8Fv5N3FSjcRXPH6kmmu/org_mid.jpg",
  "prompt": "A photo of a room, 4k photo, highly detailed",
  "scheduler": "K_EULER_ANCESTRAL",
  "guidanceScale": 7.5,
  "negativePrompt": "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured",
  "numberOfSamples": 1,
  "numberOfInferenceSteps": 30,
  "adapterConditioningScale": 1,
  "adapterConditioningFactor": 1
}

Output

The output of this action will typically return an array of generated image URLs based on the input provided.

Example Output:

[
  "https://assets.cognitiveactions.com/invocations/9372a217-9771-4662-bed9-c91249e2f452/be2e7742-287e-42a4-979b-de0e2070f91c.png",
  "https://assets.cognitiveactions.com/invocations/9372a217-9771-4662-bed9-c91249e2f452/975a795d-0769-438e-a785-8cde25ec46d2.png"
]

Conceptual Usage Example (Python)

Below is a conceptual Python code snippet illustrating how to call the Modify Images with Depth Maps action. This example focuses on structuring the input JSON payload correctly.

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "423c91e6-f82b-49e9-a6cf-ca66bd9178a8"  # Action ID for Modify Images with Depth Maps

# Construct the input payload based on the action's requirements
payload = {
    "image": "https://replicate.delivery/pbxt/JbnAzlvH84NR20HgqUdfnLlMMwwiU8Fv5N3FSjcRXPH6kmmu/org_mid.jpg",
    "prompt": "A photo of a room, 4k photo, highly detailed",
    "scheduler": "K_EULER_ANCESTRAL",
    "guidanceScale": 7.5,
    "negativePrompt": "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured",
    "numberOfSamples": 1,
    "numberOfInferenceSteps": 30,
    "adapterConditioningScale": 1,
    "adapterConditioningFactor": 1
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

This code snippet demonstrates how to construct the input payload using the required fields and how to handle the API request and response. Remember, the endpoint URL and action ID are for illustrative purposes.

Conclusion

The adirik/t2i-adapter-sdxl-depth-midas Cognitive Actions provide developers with powerful tools for image modification using depth maps. By integrating these actions into your applications, you can enhance image editing capabilities significantly. Consider exploring further use cases, such as combining depth maps with different input styles or creating interactive image editing applications. The possibilities are vast, and your creativity is the only limit!