Generate Stunning Stylized Images with joetm/camerabooth-openpose-style Actions

In the realm of image generation, the joetm/camerabooth-openpose-style API brings a powerful set of tools that allows developers to create stylized images using advanced AI techniques. By integrating StableDiffusion with OpenPose and leveraging Real-ESRGAN for image upscaling, this API enables users to produce high-quality, artistic renditions of images with just a few parameters. This blog post will guide you through the capabilities of the Generate Styled Image with StableDiffusion and Real-ESRGAN action, outlining how to effectively integrate it into your applications.
Prerequisites
Before diving into the implementation, ensure you have the following:
- An API key for the Cognitive Actions platform to authenticate your requests.
- Basic knowledge of making HTTP requests and handling JSON data.
To authenticate, you'll typically pass your API key in the headers of your requests.
Cognitive Actions Overview
Generate Styled Image with StableDiffusion and Real-ESRGAN
This action allows you to generate stylized images by combining an input pose image with an artwork reference. The integration of OpenPose helps in understanding the pose, while StableDiffusion creates artistic interpretations of the image. Real-ESRGAN is then employed to enhance the image quality through upscaling.
Input
The input to this action is defined by the following schema:
- artwork (string, required): URL of the input artwork image.
- image (string, required): URL of the input pose image.
- code (string, optional): A keyphrase for participants; automatically filled.
- seed (integer, optional): Random seed for generation; defaults to 0.
- steps (integer, optional): Number of denoising steps; default is 50 (range: 1-100).
- prompt (string, optional): Input prompt; ignored if
detectPromptis true. - clipMode (string, optional): Mode for CLIP interrogator; default is "best".
- useUpscaler (boolean, optional): Indicates whether to upscale images; default is true.
- detectPrompt (boolean, optional): Should the CLIP interrogator detect a prompt? Default is true.
- guidanceScale (number, optional): Scale for classifier-free guidance; default is 8 (range: 1-50).
- upscaleFactor (number, optional): Factor to upscale images (0-4); default is 2.
- negativePrompt (string, optional): An optional negative prompt to influence generation.
- numberOfSamples (integer, optional): Number of images to generate; default is 1 (range: 1-4).
Example Input:
{
"code": "",
"seed": 0,
"image": "https://replicate.delivery/pbxt/JdXiguR2wRjl03jOe4fLpfrEEewitrSFTStYxiBpM6Qa6Xqm/oliver-ragfelt-m79taQSsQIQ-unsplash.jpg",
"steps": 50,
"prompt": "",
"artwork": "https://replicate.delivery/pbxt/JdXihCVozhMdqBwECWdGRkbbgib7elGQnmJcheaupHfgKTIt/child-with-dove-1901.jpg",
"clipMode": "best",
"boothType": "public",
"timestamp": 0,
"useUpscaler": true,
"detectPrompt": true,
"guidanceScale": 8,
"upscaleFactor": 2,
"negativePrompt": "(((nsfw))), (((text))), (((words))), ((low quality)), worst quality, bad quality, ((bad art)), lowres, ((disfigured)), ((deformed)), ((mutilated)), glitch, ((distorted)), malformed, mutated, (((disfigured))), misaligned, poorly drawn, (((blurry))), ((blurred)), mutated, bad arms, ((extra limbs)), missing arms, missing legs, disconnected limbs, bad hands, ((poorly drawn hands)), ((poorly drawn face)), deformed hands, ((extra fingers)), ((extra legs)), fused fingers, (too many fingers), mutated hands, amputated limbs, no arms, extra arms, multiple arms, more than two legs, long neck, bad proportions, longbody, bad anatomy, missing fingers, extra digit, fewer digits, deformed eyes, poorly drawn eyes, cross-eye, ((cross-eyed)), bad skin, multiple, duplicated, by Bad Artist, monochrome, monotone, grayscale, b&w, sketches, speech bubble, signature, watermark, border, logo, ((morbid)), canvas frame, frame, 3d",
"conditioningTau": 1,
"faceEnhancement": true,
"numberOfSamples": 1,
"additionalPrompt": "",
"styleAdapterWeight": 1,
"styleConditioningTau": 0.8,
"openposeAdapterWeight": 1
}
Output
The action returns an array of URLs pointing to the generated images. Each URL links to a stylized output based on the provided artwork and pose image.
Example Output:
[
"https://assets.cognitiveactions.com/invocations/39c73e7c-8fc3-42fe-a854-b15e99bd2aee/d2f9d8e9-662d-413f-b154-2e937d5990cb.png",
"https://assets.cognitiveactions.com/invocations/39c73e7c-8fc3-42fe-a854-b15e99bd2aee/fdec6f40-e846-4c0e-bdca-922b21d87647.png"
]
Conceptual Usage Example (Python)
Here’s a conceptual example of how you might call this action using Python:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "02f03c14-4905-4c41-a45e-27bb6b94df98" # Action ID for Generate Styled Image
# Construct the input payload based on the action's requirements
payload = {
"code": "",
"seed": 0,
"image": "https://replicate.delivery/pbxt/JdXiguR2wRjl03jOe4fLpfrEEewitrSFTStYxiBpM6Qa6Xqm/oliver-ragfelt-m79taQSsQIQ-unsplash.jpg",
"steps": 50,
"prompt": "",
"artwork": "https://replicate.delivery/pbxt/JdXihCVozhMdqBwECWdGRkbbgib7elGQnmJcheaupHfgKTIt/child-with-dove-1901.jpg",
"clipMode": "best",
"useUpscaler": True,
"detectPrompt": True,
"guidanceScale": 8,
"upscaleFactor": 2,
"negativePrompt": "(((nsfw))), (((text))), (((words))), ((low quality)), worst quality, bad quality, ((bad art)), lowres, ((disfigured)), ((deformed)), ((mutilated)), glitch, ((distorted)), malformed, mutated, (((disfigured))), misaligned, poorly drawn, (((blurry))), ((blurred)), mutated, bad arms, ((extra limbs)), missing arms, missing legs, disconnected limbs, bad hands, ((poorly drawn hands)), ((poorly drawn face)), deformed hands, ((extra fingers)), ((extra legs)), fused fingers, (too many fingers), mutated hands, amputated limbs, no arms, extra arms, multiple arms, more than two legs, long neck, bad proportions, longbody, bad anatomy, missing fingers, extra digit, fewer digits, deformed eyes, poorly drawn eyes, cross-eye, ((cross-eyed)), bad skin, multiple, duplicated, by Bad Artist, monochrome, monotone, grayscale, b&w, sketches, speech bubble, signature, watermark, border, logo, ((morbid)), canvas frame, frame, 3d",
"conditioningTau": 1,
"faceEnhancement": True,
"numberOfSamples": 1,
"additionalPrompt": "",
"styleAdapterWeight": 1,
"styleConditioningTau": 0.8,
"openposeAdapterWeight": 1
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet, you can see how we set up the request by defining the action ID and constructing the input payload according to the schema. The endpoint URL and request structure are illustrative; you will need to replace them with actual values based on your implementation.
Conclusion
The Generate Styled Image with StableDiffusion and Real-ESRGAN action from the joetm/camerabooth-openpose-style spec provides a powerful way to create artistic images from pose references and artwork. By leveraging this API, developers can enhance their applications with advanced image generation capabilities, leading to innovative user experiences. Explore further by integrating more parameters or combining this action with other Cognitive Actions to create even more dynamic and engaging content!