Generate Stunning Images with the brettimus/sdxl-lua Cognitive Actions

In today's digital landscape, the ability to generate and refine images using advanced techniques is invaluable for developers. The brettimus/sdxl-lua spec provides a powerful set of Cognitive Actions designed to facilitate image generation through inpainting and img2img methods. These pre-built actions streamline the process, allowing developers to focus on creating engaging and visually appealing applications without needing to develop complex algorithms from scratch.
Prerequisites
Before diving into the Cognitive Actions, ensure you have the following:
- An API key for the Cognitive Actions platform to authenticate your requests.
- Basic understanding of JSON format, as the input and output structures are defined in this format.
- A working environment with access to Python and the
requestslibrary to execute API calls.
Authentication typically involves passing your API key in the headers of your requests.
Cognitive Actions Overview
Generate Refined Image with Inpaint
Description: This action generates a refined image by leveraging inpainting and img2img techniques. It allows for the specification of a mask, adjustment of various refinement parameters, and application of classifier-free guidance. Additionally, users can modify image dimensions, prompt strengths, and scheduler algorithms to achieve optimal output quality.
Category: Image Generation
Input
The input schema for the action includes several parameters:
- mask (string, optional): URI to the input mask for inpaint mode. Black areas will be preserved, and white areas will be inpainted.
- seed (integer, optional): Random seed for reproducibility.
- image (string, required): URI to the input image for img2img or inpaint mode.
- width (integer, optional): Output image width in pixels (default: 1024).
- height (integer, optional): Output image height in pixels (default: 1024).
- prompt (string, optional): Descriptive text for image generation (default: "An astronaut riding a rainbow unicorn").
- refine (string, optional): Refinement style (default: "no_refiner").
- loraScale (number, optional): LoRA scale affecting trained models (default: 0.6).
- scheduler (string, optional): Denoising strategy (default: "K_EULER").
- guidanceScale (number, optional): Scale for classifier-free guidance (default: 7.5).
- applyWatermark (boolean, optional): Determines if a watermark is applied (default: true).
- negativePrompt (string, optional): Specifies elements to exclude from the image.
- promptStrength (number, optional): Strength of prompt guidance when using img2img or inpaint (default: 0.8).
- numberOfOutputs (integer, optional): Number of images to generate (default: 1).
- refinementSteps (integer, optional): Number of refinement steps when using base_image_refiner.
- highNoiseFraction (number, optional): Fraction of noise during refining (default: 0.8).
- numInferenceSteps (integer, optional): Number of denoising steps (default: 50).
Example Input:
{
"width": 1024,
"height": 1024,
"prompt": "a cute TOK dog, in the style of Alphonse Mucha, art nouveau",
"refine": "expert_ensemble_refiner",
"loraScale": 0.74,
"scheduler": "K_EULER",
"guidanceScale": 7.5,
"applyWatermark": true,
"negativePrompt": "human, frame",
"promptStrength": 0.8,
"numberOfOutputs": 4,
"highNoiseFraction": 0.94,
"numInferenceSteps": 50
}
Output
The action typically returns an array of image URLs that point to the generated images.
Example Output:
[
"https://assets.cognitiveactions.com/invocations/1f9826c2-d01b-465c-8e11-269f8dbadaae/6168136f-e5b7-406f-b81c-ccd1b4dba738.png",
"https://assets.cognitiveactions.com/invocations/1f9826c2-d01b-465c-8e11-269f8dbadaae/635bf581-4b00-43d1-ab15-a6cfe081888a.png",
"https://assets.cognitiveactions.com/invocations/1f9826c2-d01b-465c-8e11-269f8dbadaae/b816d7ef-42f7-4077-8461-854c925154e1.png",
"https://assets.cognitiveactions.com/invocations/1f9826c2-d01b-465c-8e11-269f8dbadaae/33b6f272-48ce-412d-9877-b619e8d69c62.png"
]
Conceptual Usage Example (Python)
Here’s a conceptual Python code snippet illustrating how you might call the Cognitive Actions execution endpoint for this action:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "b89ae675-b58a-4055-937c-681a6dd05b51" # Action ID for Generate Refined Image with Inpaint
# Construct the input payload based on the action's requirements
payload = {
"width": 1024,
"height": 1024,
"prompt": "a cute TOK dog, in the style of Alphonse Mucha, art nouveau",
"refine": "expert_ensemble_refiner",
"loraScale": 0.74,
"scheduler": "K_EULER",
"guidanceScale": 7.5,
"applyWatermark": True,
"negativePrompt": "human, frame",
"promptStrength": 0.8,
"numberOfOutputs": 4,
"highNoiseFraction": 0.94,
"numInferenceSteps": 50
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this snippet, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The action ID for the "Generate Refined Image with Inpaint" action is set, and the input payload is constructed using the example provided. The endpoint URL is hypothetical and should be adjusted to match the actual endpoint in your implementation.
Conclusion
The brettimus/sdxl-lua Cognitive Actions offer a robust framework for developers looking to integrate image generation capabilities into their applications effortlessly. With the ability to fine-tune parameters such as guidance scale, prompt strength, and refinement methods, developers can create stunning visuals tailored to their specific needs. Explore these actions further to unlock the full potential of your applications!