Unlocking Image Depth Estimation with Adirik/Marigold Cognitive Actions

24 Apr 2025
Unlocking Image Depth Estimation with Adirik/Marigold Cognitive Actions

In the realm of image processing, the ability to estimate depth from images is a powerful tool that can enhance applications in fields ranging from computer vision to augmented reality. The Adirik/Marigold Cognitive Actions provide developers with pre-built actions that simplify the process of generating depth maps from single RGB or grayscale images. By leveraging the Marigold diffusion model, these actions offer improved accuracy through advanced techniques like ensembling and denoising.

Prerequisites

Before you start integrating the Adirik/Marigold Cognitive Actions into your applications, ensure you have the following:

  • An API key for accessing the Cognitive Actions platform.
  • Basic familiarity with RESTful API calls and JSON payload structures.
  • A Python environment set up for making HTTP requests.

Authentication typically involves passing your API key in the request headers, allowing you to securely access the Cognitive Actions functionality.

Cognitive Actions Overview

Estimate Monocular Depth

The Estimate Monocular Depth action is designed to predict depth maps from single images, utilizing the Marigold diffusion model. It generates both grayscale and spectral depth maps, enhancing accuracy through multiple inferences, denoising, and optimization techniques.

Input

The input for this action requires a JSON object with specific properties. Below is the schema along with an example:

{
  "image": "https://replicate.delivery/pbxt/K3HlYnhvVI5IX35KJ6RkTTzoHwEDKL7KtAiPc4F4fDvcgJX3/pete-walls-92JRuvQZfKs-unsplash_crop43.jpg",
  "resizeInput": true,
  "denoiseSteps": 10,
  "maxIterations": 5,
  "numInferences": 10,
  "reductionMethod": "median",
  "regularizerStrength": 0.02
}
  • image (required): URL of the input image (in RGB format).
  • resizeInput (optional): Boolean to resize the image to maximum resolution (default: true).
  • denoiseSteps (optional): Number of steps for denoising (default: 10, range: 1-50).
  • maxIterations (optional): Maximum number of optimization iterations (default: 5, range: 1-20).
  • numInferences (optional): Number of inferences to ensemble (default: 10, range: 1-20).
  • reductionMethod (optional): Method for merging aligned depth maps (default: "median").
  • regularizerStrength (optional): Weight of the optimization regularizer (default: 0.02, range: 0-1).

Output

The output of this action typically includes URLs pointing to the generated depth maps. An example output might look like this:

[
  "https://assets.cognitiveactions.com/invocations/47a294c0-1603-4f03-bf6a-cd498898e1f0/95ca48b8-82a5-47f7-8f43-16f06eb7887c.png",
  "https://assets.cognitiveactions.com/invocations/47a294c0-1603-4f03-bf6a-cd498898e1f0/91acf50d-36cc-43ae-84b3-fee02a4eeadd.png"
]

Conceptual Usage Example (Python)

Here’s a conceptual Python code snippet demonstrating how to call the Estimate Monocular Depth action:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "9f5fb26f-83ea-49c0-889e-03d86d6d6125" # Action ID for Estimate Monocular Depth

# Construct the input payload based on the action's requirements
payload = {
    "image": "https://replicate.delivery/pbxt/K3HlYnhvVI5IX35KJ6RkTTzoHwEDKL7KtAiPc4F4fDvcgJX3/pete-walls-92JRuvQZfKs-unsplash_crop43.jpg",
    "resizeInput": true,
    "denoiseSteps": 10,
    "maxIterations": 5,
    "numInferences": 10,
    "reductionMethod": "median",
    "regularizerStrength": 0.02
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The action ID and input payload are structured according to the requirements for the Estimate Monocular Depth action. The endpoint URL and request structure are illustrative; ensure they align with the actual API documentation when implemented.

Conclusion

The Adirik/Marigold Cognitive Actions, particularly the Estimate Monocular Depth action, provide a robust solution for developers looking to integrate advanced image analysis capabilities into their applications. With features like ensembling and optimization, these actions not only simplify the process but also enhance the accuracy of depth estimation. Consider exploring additional use cases such as 3D modeling, object detection, and scene reconstruction to fully leverage the power of these Cognitive Actions in your projects.