Create Stunning Visual Narratives with the camenduru/story-diffusion Action

In the world of content creation, the ability to generate captivating visuals from textual narratives can significantly enhance storytelling. The camenduru/story-diffusion spec provides developers with a powerful Cognitive Action called Generate Story with StoryDiffusion. This action leverages self-attention techniques to create long-range images and videos, enabling creators to craft immersive stories with ease. With customizable options for styles, framing, and character attributes, the potential for unique storytelling is boundless.
Prerequisites
Before diving into the integration of the Cognitive Actions, ensure you have the following:
- An API key for the Cognitive Actions platform.
- Basic knowledge of JSON format and HTTP requests.
- Familiarity with Python and libraries like
requestsfor making API calls.
Authentication typically involves including your API key in the request header.
Cognitive Actions Overview
Generate Story with StoryDiffusion
The Generate Story with StoryDiffusion action is designed to transform textual inputs into vivid imagery. By selecting style templates and adjusting various parameters, users can control how their stories are visually represented.
Input
The input for this action requires the following fields:
- inputImage (string, required): A URL pointing to the input image.
- style (string, optional): The style template to apply. Options include 'Japanese Anime', 'Cinematic', 'Disney Character', etc. Default is 'Japanese Anime'.
- randomSeed (integer, optional): A seed for randomization, defaulting to 1.
- idImageCount (integer, optional): Number of ID images in the output, defaulting to 3.
- guidanceScale (integer, optional): Influence level during generation, defaulting to 5.
- numberOfSteps (integer, optional): Total sampling steps, defaulting to 50.
- generatedWidth (integer, optional): Output image width in pixels, defaulting to 768.
- generatedHeight (integer, optional): Output image height in pixels, defaulting to 768.
- selfAttention32 (number, optional): Attention intensity at 32x32 resolution, defaulting to 0.5.
- selfAttention64 (number, optional): Attention intensity at 64x64 resolution, defaulting to 0.5.
- comicDescription (string, optional): A list of descriptions for comic frames.
- comicStylesetType (string, optional): Typesetting style for comics, defaulting to 'Classic Comic Style'.
- styleStrengthRatio (integer, optional): Intensity of the reference image style, defaulting to 20%.
- characterControlType (string, optional): Control method for characters, defaulting to 'Using Ref Images'.
- characterDescription (string, optional): Description of the character's attributes.
- inputAdapterStrength (number, optional): Influence strength of the input adapter, defaulting to 0.5.
- negativePrompt (string, optional): Attributes to avoid in the output image.
Example Input:
{
"style": "Japanese Anime",
"inputImage": "https://replicate.delivery/pbxt/KqySXsVmWku71q5LZeNjgasK4oVRILdFPt9dKKCEYG5ZFVko/1%20%281%29.jpeg",
"randomSeed": 1,
"idImageCount": 3,
"guidanceScale": 5,
"numberOfSteps": 50,
"generatedWidth": 768,
"generatedHeight": 768,
"selfAttention32": 0.5,
"selfAttention64": 0.5,
"comicDescription": "wake up in the bed\nhave breakfast\nis on the road, go to company\nwork in the company\nTake a walk next to the company at noon\nlying in bed at night",
"comicStylesetType": "Classic Comic Style",
"styleStrengthRatio": 20,
"characterControlType": "Using Ref Images",
"characterDescription": "a woman img, wearing a white T-shirt, blue loose hair",
"inputAdapterStrength": 0.5,
"negativePrompt": "bad anatomy, bad hands, missing fingers, extra fingers, three hands, three legs, bad arms, missing legs, missing arms, poorly drawn face, bad face, fused face, cloned face, three crus, fused feet, fused thigh, extra crus, ugly fingers, horn, cartoon, cg, 3d, unreal, animate, amputation, disconnected limbs"
}
Output
The output of this action is typically a list of URLs pointing to the generated images based on the narrative and parameters provided.
Example Output:
[
"https://assets.cognitiveactions.com/invocations/a03e661e-d7a4-4cca-9e21-d4bc792adf4a/fb750b59-285a-4939-8264-c8d9b764b4d9.png",
"https://assets.cognitiveactions.com/invocations/a03e661e-d7a4-4cca-9e21-d4bc792adf4a/2c205129-b6ec-469a-9032-4dd3168c08c8.png",
"https://assets.cognitiveactions.com/invocations/a03e661e-d7a4-4cca-9e21-d4bc792adf4a/11dfd1d2-934f-4c68-aa6e-25ca7c9f8fe5.png",
...
]
Conceptual Usage Example (Python)
Here’s a conceptual Python code snippet demonstrating how to invoke the Generate Story with StoryDiffusion action:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "8f361aa4-c17a-4b7e-b6e6-fcd884e4f418" # Action ID for Generate Story with StoryDiffusion
# Construct the input payload based on the action's requirements
payload = {
"style": "Japanese Anime",
"inputImage": "https://replicate.delivery/pbxt/KqySXsVmWku71q5LZeNjgasK4oVRILdFPt9dKKCEYG5ZFVko/1%20%281%29.jpeg",
"randomSeed": 1,
"idImageCount": 3,
"guidanceScale": 5,
"numberOfSteps": 50,
"generatedWidth": 768,
"generatedHeight": 768,
"selfAttention32": 0.5,
"selfAttention64": 0.5,
"comicDescription": "wake up in the bed\nhave breakfast\nis on the road, go to company\nwork in the company\nTake a walk next to the company at noon\nlying in bed at night",
"comicStylesetType": "Classic Comic Style",
"styleStrengthRatio": 20,
"characterControlType": "Using Ref Images",
"characterDescription": "a woman img, wearing a white T-shirt, blue loose hair",
"inputAdapterStrength": 0.5,
"negativePrompt": "bad anatomy, bad hands, missing fingers, extra fingers, three hands, three legs, bad arms, missing legs, missing arms, poorly drawn face, bad face, fused face, cloned face, three crus, fused feet, fused thigh, extra crus, ugly fingers, horn, cartoon, cg, 3d, unreal, animate, amputation, disconnected limbs"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this example, replace "YOUR_COGNITIVE_ACTIONS_API_KEY" with your actual API key. The action ID and payload structure are critical for successful execution.
Conclusion
The Generate Story with StoryDiffusion action opens the door to imaginative visual storytelling by combining narrative text with advanced image generation techniques. By customizing various parameters, developers can create unique narratives that resonate with their audiences. Explore the possibilities and consider integrating these Cognitive Actions into your applications to elevate your content creation process.