Enhance Your Images with Canny Edge Detection Using ControlNet Actions

In the realm of image processing, the ability to manipulate images for artistic or analytical purposes is invaluable. The jagilley/controlnet-canny API provides a powerful Cognitive Action that leverages the Canny edge detection algorithm to generate images with enhanced structure and detail. By utilizing this action, developers can create visually appealing results that maintain the integrity of the original input while adding user-defined specifications. In this post, we'll explore how to integrate this action into your applications and the benefits it can provide.
Prerequisites
Before diving into the integration of the Cognitive Actions, ensure you have the following:
- An API key for the Cognitive Actions platform.
- Basic knowledge of making HTTP requests and handling JSON data.
- A Python environment set up for running the provided conceptual code snippets.
Authentication typically involves passing your API key in the request headers, allowing secure access to the service.
Cognitive Actions Overview
Generate Image with Canny Edge Detection
Description: This action generates an image using Canny edge detection from an input image and a prompt. It employs the ControlNet model in conjunction with Stable Diffusion 1.5 to create images that retain the structure of the input while incorporating user-defined details through text prompts. Users can specify Canny edge thresholds and adjust various parameters for enhanced control.
Category: image-processing
Input
The input for this action requires the following fields:
- image (required): A URI pointing to the input image to be processed. It must be a web-accessible URL in string format.
- prompt (required): A descriptive text that guides the model in generating the output image.
- seed (optional): An integer that sets the random seed for reproducibility.
- scale (optional): A number that acts as a scale factor for classifier-free guidance, influencing the prompt's impact.
- ddimSteps (optional): An integer indicating the number of diffusion steps for denoising.
- lowThreshold (optional): An integer setting the lower threshold for Canny edge detection.
- highThreshold (optional): An integer defining the upper threshold for Canny edge detection.
- negativePrompt (optional): Descriptive terms to avoid in the output.
- imageResolution (optional): The resolution for the generated image (256, 512, or 768).
- numberOfSamples (optional): Indicates how many image samples to generate.
- additionalPrompt (optional): Extra text to enhance the output quality.
- estimatedTimeOfArrival (optional): A number controlling the noise level for the diffusion process.
Here’s an example of the JSON payload needed to invoke this action:
{
"image": "https://replicate.delivery/pbxt/IMPLYODUwdmHTsnLKi5YiFccIAK6g9l5KK1FNyCtpGS1g0UN/1200.jpeg",
"scale": 9,
"prompt": "a metallic cyborg bird",
"ddimSteps": 20,
"lowThreshold": 100,
"highThreshold": 200,
"negativePrompt": "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
"imageResolution": "512",
"numberOfSamples": "1",
"additionalPrompt": "best quality, extremely detailed"
}
Output
The action typically returns an array of generated image URIs based on the provided input. Here’s an example of what the output may look like:
[
"https://assets.cognitiveactions.com/invocations/4a2e57c6-0c41-4cab-9844-339f650409e0/b7c14558-bf18-461e-b874-a6cc61a5846d.png",
"https://assets.cognitiveactions.com/invocations/4a2e57c6-0c41-4cab-9844-339f650409e0/77099680-e74f-466a-9e93-21868bea9d7f.png"
]
Conceptual Usage Example (Python)
Here’s a conceptual Python code snippet illustrating how a developer might call the Cognitive Actions execution endpoint for this action:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "c7d2a13f-4ce8-4a90-a1b7-26b08923c3c0" # Action ID for Generate Image with Canny Edge Detection
# Construct the input payload based on the action's requirements
payload = {
"image": "https://replicate.delivery/pbxt/IMPLYODUwdmHTsnLKi5YiFccIAK6g9l5KK1FNyCtpGS1g0UN/1200.jpeg",
"scale": 9,
"prompt": "a metallic cyborg bird",
"ddimSteps": 20,
"lowThreshold": 100,
"highThreshold": 200,
"negativePrompt": "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
"imageResolution": "512",
"numberOfSamples": "1",
"additionalPrompt": "best quality, extremely detailed"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The payload is structured based on the action's requirements, and the action_id is set for the Canny edge detection action. The endpoint URL and request structure are illustrative, focusing on how to pass your input correctly.
Conclusion
The Canny edge detection action within the jagilley/controlnet-canny API enables developers to produce unique and structured images from their inputs creatively. By integrating this action into applications, users can explore a wide variety of artistic possibilities while maintaining control over the output. Consider experimenting with different input parameters to discover new creative outputs and enhance your applications with this powerful image processing capability. Happy coding!