Create Stunning Images with Edge Guidance Using Flux Canny Actions

In today's digital landscape, the demand for high-quality, visually compelling images is ever-growing. The black-forest-labs/flux-canny-dev API provides developers with powerful Cognitive Actions that simplify the image generation process. Among these actions, the ability to create detailed images guided by user-provided sketches or edge maps stands out. By leveraging Canny edge detection, developers can control the structure and composition of generated images, enabling unique artistic expressions and designs.
Prerequisites
Before diving into the integration of these Cognitive Actions, ensure you have the following:
- An API key for accessing the Cognitive Actions platform.
- Basic knowledge of making API calls and handling JSON data.
Authentication typically involves passing your API key in the request headers to authorize access to the Cognitive Actions.
Cognitive Actions Overview
Generate Image with Edge Guidance
Purpose: This action allows you to create stylistically rich images that are guided by sketches or edge maps. By using Canny edge detection, the generated images will align closely with user-defined structures.
Category: Image Generation
Input
The required and optional fields for this action are defined in the input schema. Below is a breakdown of the input structure:
- controlImageUri (string, required): URI of the image used to influence the generation process. Canny edge detection is applied automatically.
- prompt (string, required): A descriptive text prompt to guide the contents and style of the generated image (e.g., "A red vintage convertible driving through an old town").
- seed (integer, optional): A random seed for reproducibility.
- guidance (number, optional): Degree of adherence to the prompt, ranging from 0 to 100 (default is 30).
- numberOfOutputs (integer, optional): Specifies how many output images to generate (valid range: 1-4, default is 1).
- outputImageFormat (string, optional): Specifies the file format for the output images (options: "webp", "jpg", "png", default is "webp").
- outputImageQuality (integer, optional): Quality setting for saving output images, ranging from 0 (lowest) to 100 (best), not applicable for PNG (default is 80).
- approximateMegapixels (string, optional): Approximate resolution of the generated image (default is "1", or "match_input" to align with the input image size).
- numberOfInferenceSteps (integer, optional): Number of iterative steps for image denoising (default is 28, recommended settings between 28 and 50).
- disableImageSafetyChecker (boolean, optional): Flag to disable the safety checker for filtering out unsafe content (default is false).
Example Input:
{
"prompt": "A red vintage convertible driving through an old town",
"guidance": 30,
"controlImageUri": "https://replicate.delivery/pbxt/M0mUcvKOwNSS4axvv7LAngBAR5Iuv2GsIcnSKdpQRJA62f8G/Screenshot%202024-11-21%20at%2016.08.20.png",
"numberOfOutputs": 1,
"outputImageFormat": "webp",
"outputImageQuality": 80,
"approximateMegapixels": "1",
"numberOfInferenceSteps": 28
}
Output
The output of this action will typically return an array of URLs pointing to the generated images. Below is an example of the output structure:
Example Output:
[
"https://assets.cognitiveactions.com/invocations/0d43a4e5-c4bf-4442-a971-36ac9b1063f2/f17b15e5-2c99-43de-bd69-c92f77d071ac.webp"
]
Conceptual Usage Example (Python)
Here’s a conceptual Python code snippet demonstrating how to invoke the "Generate Image with Edge Guidance" action:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "3904a97e-1115-4d70-a3af-e41745dc394f" # Action ID for Generate Image with Edge Guidance
# Construct the input payload based on the action's requirements
payload = {
"prompt": "A red vintage convertible driving through an old town",
"guidance": 30,
"controlImageUri": "https://replicate.delivery/pbxt/M0mUcvKOwNSS4axvv7LAngBAR5Iuv2GsIcnSKdpQRJA62f8G/Screenshot%202024-11-21%20at%2016.08.20.png",
"numberOfOutputs": 1,
"outputImageFormat": "webp",
"outputImageQuality": 80,
"approximateMegapixels": "1",
"numberOfInferenceSteps": 28
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this example, the action ID and input payload are structured correctly, showcasing how to communicate with the hypothetical Cognitive Actions execution endpoint.
Conclusion
The black-forest-labs/flux-canny-dev API offers powerful capabilities for generating images with edge guidance, unlocking creative potential for developers. By utilizing the "Generate Image with Edge Guidance" action, you can produce stunning visuals that adhere to user-defined prompts and structures. Explore these Cognitive Actions further to enhance your applications and create unique image content that stands out. Happy coding!