Generate High-Quality Depth Maps with the zsxkib/patch-fusion Cognitive Actions

24 Apr 2025
Generate High-Quality Depth Maps with the zsxkib/patch-fusion Cognitive Actions

Integrating advanced image processing capabilities into your applications can significantly enhance their functionality, particularly in areas requiring detailed depth estimations. The zsxkib/patch-fusion spec offers a powerful Cognitive Action designed to generate high-resolution depth maps from a single image using a sophisticated tile-based approach. This action leverages the ZoeDepth model to produce accurate and detailed depth maps, making it ideal for applications in gaming, virtual reality, and augmented reality.

Prerequisites

Before you start using the Cognitive Actions provided by the zsxkib/patch-fusion spec, ensure you have the following:

  • An API key for the Cognitive Actions platform.
  • Knowledge of how to authenticate your requests by passing the API key in the request headers.

The authentication typically involves including the API key in the header as a bearer token, allowing you to access the Cognitive Actions securely.

Cognitive Actions Overview

Generate High-Quality Depth Maps

The Generate High-Quality Depth Maps action is designed to produce high-resolution monocular metric depth estimations. This action enhances speed, quality, and accuracy for applications that need depth measurement from images.

Input

The input for this action is structured as follows:

{
  "seed": -1,
  "scale": 9,
  "prompt": "Pastel painting, with vibrant colours and good vibes",
  "colormap": "magma",
  "strength": 1,
  "ddimSteps": 20,
  "tilingMode": "P49",
  "patchNumber": 256,
  "sourceImage": "https://replicate.delivery/pbxt/K7bks8mWnMxlQC19IPeFGtfr9YtGEUw5vgQhiu3olsD6vcoU/example_2.jpeg",
  "useGuessMode": false,
  "negativePrompt": "worst quality, low quality, lose details",
  "patchSizeWidth": 960,
  "imageResolution": 896,
  "patchSizeHeight": 540,
  "resolutionWidth": 3840,
  "additionalPrompt": "best quality, extremely detailed",
  "resolutionHeight": 2160,
  "estimatedTimeArrival": 0
}
  • seed: (integer) Fixed random seed for reproducibility. Leave blank for a random seed.
  • scale: (number) Guidance scale affecting image generation; ranges from 0.1 to 50 (default: 9).
  • prompt: (string) Text prompt describing the desired image output.
  • colormap: (string) Colormap for rendering the depth map. Options: 'magma' and 'spectral'.
  • strength: (number) Strength of control over the final image appearance; values between 0 and 2 (default: 1).
  • ddimSteps: (integer) Number of diffusion steps (1 to 50, default: 20).
  • tilingMode: (string) Mode for image tiling; options: 'P49' and 'R'.
  • patchNumber: (integer) Quantity of random patches (1 to 256, default: 256).
  • sourceImage: (string) URL of the input image (must be a valid URI).
  • useGuessMode: (boolean) Activates guess mode for uncertain areas (default: false).
  • negativePrompt: (string) Specifies elements to be minimized or avoided in the image.
  • patchSizeWidth: (integer) Width of the patch in pixels (256 to 1200, default: 960).
  • imageResolution: (integer) ControlNet image resolution (256 to 896, default: 896).
  • patchSizeHeight: (integer) Height of the patch in pixels (256 to 675, default: 540).
  • resolutionWidth: (integer) Processing output width in pixels (256 to 4800, default: 3840).
  • additionalPrompt: (string) Enhances specificity or desired qualities in the image.
  • resolutionHeight: (integer) Processing output height in pixels (256 to 2700, default: 2160).
  • estimatedTimeArrival: (number) ETA for completion of DDIM steps in seconds (default: 0).

Output

Upon successful execution, this action typically returns an array of URLs pointing to the generated depth map images. For example:

[
  "https://assets.cognitiveactions.com/invocations/eb947a4c-2465-499a-a387-52006f9e21bf/a06472a8-5f4d-432f-87a1-8af918ab14f5.png",
  "https://assets.cognitiveactions.com/invocations/eb947a4c-2465-499a-a387-52006f9e21bf/0c0ba11d-f2b8-4101-8583-7af6d5642bdc.png"
]

Conceptual Usage Example (Python)

Here’s how you might call this action programmatically in Python using a hypothetical Cognitive Actions execution endpoint:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "570bd65c-0a6d-4707-afd8-b6b1c611a721" # Action ID for Generate High-Quality Depth Maps

# Construct the input payload based on the action's requirements
payload = {
    "seed": -1,
    "scale": 9,
    "prompt": "Pastel painting, with vibrant colours and good vibes",
    "colormap": "magma",
    "strength": 1,
    "ddimSteps": 20,
    "tilingMode": "P49",
    "patchNumber": 256,
    "sourceImage": "https://replicate.delivery/pbxt/K7bks8mWnMxlQC19IPeFGtfr9YtGEUw5vgQhiu3olsD6vcoU/example_2.jpeg",
    "useGuessMode": False,
    "negativePrompt": "worst quality, low quality, lose details",
    "patchSizeWidth": 960,
    "imageResolution": 896,
    "patchSizeHeight": 540,
    "resolutionWidth": 3840,
    "additionalPrompt": "best quality, extremely detailed",
    "resolutionHeight": 2160,
    "estimatedTimeArrival": 0
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this example, replace the placeholder for COGNITIVE_ACTIONS_API_KEY with your actual API key and ensure the endpoint URL is accurate for your configuration. The payload is constructed according to the required input schema, and the action ID is specified.

Conclusion

The zsxkib/patch-fusion Cognitive Action for generating high-quality depth maps offers developers a robust tool for creating detailed depth estimations from images. By leveraging this action, you can enhance your applications with advanced image processing capabilities, opening doors to new functionalities in various fields such as gaming, VR, and AR. Consider exploring additional use cases or integrating this action into your existing projects for enhanced visual fidelity.