Generate Stunning Image Variations with the jagilley/stable-diffusion-depth2img Cognitive Actions

22 Apr 2025
Generate Stunning Image Variations with the jagilley/stable-diffusion-depth2img Cognitive Actions

In today's digital landscape, the demand for creative and unique visuals is ever-increasing. The jagilley/stable-diffusion-depth2img Cognitive Actions provide a powerful solution for developers looking to generate image variations while preserving depth and shape. By leveraging a diffusion-based text-to-image model, these actions allow for fine-tuning with depth prediction, resulting in enhanced image generation based on user-defined text prompts. In this article, we'll explore how to integrate this functionality into your applications.

Prerequisites

Before diving into the integration, ensure you have the following:

  • An API key for accessing the Cognitive Actions platform.
  • Familiarity with making HTTP requests and handling JSON data.
  • A development environment set up for Python, as we will provide conceptual code snippets in this language.

To authenticate your requests, you will typically pass your API key in the headers of your HTTP calls.

Cognitive Actions Overview

Create Image Variations with Depth Preservation

Description: This action generates variations of a specified image while maintaining its shape and depth. It utilizes a diffusion-based model to create unique images based on textual prompts, enhancing the creative process.

Category: Image Generation

Input

The input for this action requires the following fields:

  • inputImage (required): The URI of the input image to be transformed.
  • prompt: A text prompt guiding the image generation (default: "Wanderer above the sea of fog, digital art").
  • depthImage: An optional URI specifying the depth of each pixel in the input image for depth-aware transformations.
  • seed: A random seed for image generation (-1 or left blank for randomization).
  • scheduler: The type of scheduler for processing (default: "DPMSolverMultistep").
  • guidanceScale: Controls how closely the generated image matches the prompt (range: 1-20, default: 7.5).
  • negativePrompt: Keywords to exclude from the generated image.
  • promptStrength: Strength of the prompt in relation to the input image (default: 0.8).
  • numberOfOutputs: How many images to generate (range: 1-8, default: 1).
  • numberOfInferenceSteps: The number of inference steps for denoising (range: 1-500, default: 50).

Example Input:

{
  "seed": -1,
  "prompt": "wanderer above a cyberpunk city, 4k digital art rendering by caspar david friedrich",
  "scheduler": "DPMSolverMultistep",
  "inputImage": "https://replicate.delivery/pbxt/ICo443xcFQGIK4lawWN3ytMNDZsmZS6fZfjGYwFP6Dc5Vfnq/wanderer.jpeg",
  "guidanceScale": 7.5,
  "promptStrength": 0.8,
  "numberOfOutputs": 1,
  "numberOfInferenceSteps": 50
}

Output

The output from this action typically returns an array of URIs pointing to the generated image(s).

Example Output:

[
  "https://assets.cognitiveactions.com/invocations/39a21c0e-b2fb-4449-9834-8367ecda9c43/c224ad2e-40e4-4fb3-8600-85c8e7aefa87.png"
]

Conceptual Usage Example (Python)

Below is a conceptual Python snippet to demonstrate how to call this Cognitive Action. Ensure to replace placeholders with your actual values.

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "4c36a6ba-0ae4-4ea7-87b1-5a4c83c9c357"  # Action ID for Create Image Variations with Depth Preservation

# Construct the input payload based on the action's requirements
payload = {
    "seed": -1,
    "prompt": "wanderer above a cyberpunk city, 4k digital art rendering by caspar david friedrich",
    "scheduler": "DPMSolverMultistep",
    "inputImage": "https://replicate.delivery/pbxt/ICo443xcFQGIK4lawWN3ytMNDZsmZS6fZfjGYwFP6Dc5Vfnq/wanderer.jpeg",
    "guidanceScale": 7.5,
    "promptStrength": 0.8,
    "numberOfOutputs": 1,
    "numberOfInferenceSteps": 50
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code, you replace the COGNITIVE_ACTIONS_API_KEY with your actual API key and ensure your Cognitive Actions API endpoint is correct. The action ID corresponds to the action you want to execute, and the payload is constructed based on the required input schema.

Conclusion

The jagilley/stable-diffusion-depth2img Cognitive Actions empower developers to create unique and stunning image variations effortlessly. By leveraging the capabilities of depth preservation and text prompts, you can enhance your applications with advanced image generation features. Explore the potential of these actions in your projects, and consider experimenting with different prompts and settings to create captivating visuals!