Enhance Your Images with Cognitive Actions from the fofr/sdxl-cats-movie Spec

In the realm of image generation and refinement, the fofr/sdxl-cats-movie spec offers powerful Cognitive Actions that allow developers to enhance images using advanced techniques. With built-in support for various modes like image-to-image transformation and inpainting, these actions empower you to produce high-quality visual content tailored to specific needs. This blog post will guide you through the capabilities of the Generate Enhanced Image action, its requirements, and how to effectively integrate it into your applications.
Prerequisites
Before diving into the Cognitive Actions, ensure you have the following:
- An API key for the Cognitive Actions platform, which will be required for authentication.
- Basic familiarity with JSON and RESTful APIs for seamless integration.
Authentication typically involves passing your API key in the request headers, allowing you to securely access the Cognitive Actions services.
Cognitive Actions Overview
Generate Enhanced Image
The Generate Enhanced Image action allows you to create refined images from existing photos by utilizing various image enhancement techniques. It supports both img2img and inpainting modes, providing flexibility in terms of image dimensions, style prompts, and refinement parameters for superior output quality.
Input
The input schema for this action is a JSON object, and here’s a breakdown of the required and optional fields:
- mask (optional): URI of the input mask for inpaint mode. Black areas are preserved, while white areas are inpainted.
- seed (optional): Random seed for image generation. If left blank, a randomized seed will be used.
- image (required): URI of the input image for img2img or inpaint mode.
- width (optional): Width of the output image in pixels (default is 1024).
- height (optional): Height of the output image in pixels (default is 1024).
- prompt (optional): Descriptive prompt to guide image generation (default: "An astronaut riding a rainbow unicorn").
- refine (optional): Refinement style to use for image processing (default: "no_refiner").
- loraScale (optional): Scale factor for LoRA affecting trained models (default: 0.6, range: 0 to 1).
- scheduler (optional): Scheduler algorithm for the denoising process (default: "K_EULER").
- guidanceScale (optional): Intensity scale for classifier-free guidance (default: 7.5).
- negativePrompt (optional): Negative prompt to exclude certain features from the image.
- promptStrength (optional): Strength of the prompt for img2img or inpaint (default: 0.8).
- numberOfOutputs (optional): Number of output images to generate (default: 1, maximum of 4).
- refinementSteps (optional): Number of steps for refinement when using "base_image_refiner".
- highNoiseFraction (optional): Fraction of noise for "expert_ensemble_refiner" (default: 0.8).
- isWatermarkApplied (optional): Boolean to determine if a watermark is applied (default: true).
- numberOfInferenceSteps (optional): Total number of denoising steps (default: 50, maximum of 500).
Here’s an example of the JSON payload needed to invoke the action:
{
"width": 1360,
"height": 768,
"prompt": "A photo of a TOK, dynamic action pose, film still",
"refine": "no_refiner",
"loraScale": 0.6,
"scheduler": "K_EULER",
"guidanceScale": 7.5,
"negativePrompt": "people, two people, extra arms",
"promptStrength": 0.8,
"numberOfOutputs": 1,
"highNoiseFraction": 0.8,
"isWatermarkApplied": true,
"numberOfInferenceSteps": 50
}
Output
Upon successful execution, the action returns a URL pointing to the generated enhanced image. Here’s an example of what that output might look like:
[
"https://assets.cognitiveactions.com/invocations/031c5be9-8984-421f-95c5-7c8568c5be4b/5ecdd89a-bcd4-4a7a-a5a0-3bc1ba8f4b9d.png"
]
Conceptual Usage Example (Python)
Here’s how you might call the Generate Enhanced Image action using Python:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "96fa01ee-372f-470e-802f-94a2a3c53e48" # Action ID for Generate Enhanced Image
# Construct the input payload based on the action's requirements
payload = {
"width": 1360,
"height": 768,
"prompt": "A photo of a TOK, dynamic action pose, film still",
"refine": "no_refiner",
"loraScale": 0.6,
"scheduler": "K_EULER",
"guidanceScale": 7.5,
"negativePrompt": "people, two people, extra arms",
"promptStrength": 0.8,
"numberOfOutputs": 1,
"highNoiseFraction": 0.8,
"isWatermarkApplied": true,
"numberOfInferenceSteps": 50
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet, you replace the placeholder for the API key and endpoint with your actual credentials. The payload is structured according to the input schema outlined above, ensuring a smooth execution of the action.
Conclusion
The Cognitive Actions provided by the fofr/sdxl-cats-movie spec, particularly the Generate Enhanced Image action, open up exciting possibilities for developers seeking to integrate image generation and refinement capabilities into their applications. By leveraging these pre-built actions, you can efficiently enhance your visual content and innovate in areas like digital art, media, and more. Explore the potential of these actions and consider how they can fit into your next project!