Unlocking Creative Potential: Integrate Image Generation with One Diffusion Actions

25 Apr 2025
Unlocking Creative Potential: Integrate Image Generation with One Diffusion Actions

In today's rapidly evolving tech landscape, the ability to generate high-quality images from textual descriptions or manipulate existing images can significantly enhance user engagement and creativity. The One Diffusion model, featured in the chenxwh/onediffusion specification, provides developers with powerful Cognitive Actions for versatile image synthesis. Whether you’re looking to create stunning visuals or enhance existing ones, these actions enable a wide array of tasks, from text-to-image generation to image inpainting. In this post, we’ll explore how to leverage these Cognitive Actions effectively in your applications.

Prerequisites

Before you dive into using the Cognitive Actions provided by the One Diffusion model, ensure you have the following:

  • An API key for the Cognitive Actions platform, which you will use to authenticate your requests.
  • Basic knowledge of sending HTTP requests using a programming language such as Python.

Authentication typically involves passing your API key in the request headers, allowing you to securely access the Cognitive Actions endpoint.

Cognitive Actions Overview

Generate Image with One Diffusion

The Generate Image with One Diffusion action utilizes the One Diffusion model to create images based on specified tasks. This action is particularly useful for developers looking to implement features like text-to-image generation or image inpainting.

Input

The action accepts a structured input defined by the following schema:

  • seed (integer, optional): Random seed for generation. Leave blank to randomize the seed.
  • task (string, required): Select the task type, with options like text2image, deblurring, and image_inpainting. Default is text2image.
  • width (integer, optional): Output image width in pixels. Default is 1024.
  • height (integer, optional): Output image height in pixels. Default is 1024.
  • image1 (string, optional): URI of the first input image for image-to-image tasks.
  • image2 (string, optional): URI of the second input image for image-to-image tasks.
  • image3 (string, optional): URI of the third input image for image-to-image tasks.
  • prompt (string, required): A textual description guiding the generation process.
  • azimuth (string, optional): Comma-separated azimuth angles for multiview generation. Default is "0".
  • distance (string, optional): Comma-separated distances for multiview generation. Default is "1.5".
  • elevation (string, optional): Comma-separated elevation angles for multiview generation. Default is "0".
  • denoiseMask (string, optional): Denoise mask for output images.
  • focalLength (number, optional): Camera focal length for multiview generation. Default is 1.3887.
  • guidanceScale (number, optional): Scale factor for classifier-free guidance, between 1 and 20. Default is 4.
  • negativePrompt (string, optional): Exclusions from the output.
  • numInferenceSteps (integer, optional): Number of steps for the denoising process, between 1 and 500. Default is 50.
  • useInputImageSize (boolean, optional): Match output image dimensions to the input image when set to True. Default is false.

Example Input JSON:

{
  "task": "text2image",
  "width": 1024,
  "height": 1024,
  "prompt": "A bipedal black cat wearing a huge oversized witch hat, a wizards robe, casting a spell, in an enchanted forest. The scene is filled with fireflies and moss on surrounding rocks and trees",
  "azimuth": "0",
  "distance": "1.5",
  "elevation": "0",
  "denoiseMask": "0",
  "focalLength": 1.3887,
  "guidanceScale": 4,
  "negativePrompt": "monochrome, greyscale, low-res, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, blurry, amputation",
  "numInferenceSteps": 50,
  "useInputImageSize": false
}

Output

Upon successful execution, the action typically returns a URL pointing to the generated image:

Example Output:

[
  "https://assets.cognitiveactions.com/invocations/fff8a645-5f07-4a80-b3af-722298cb4b6a/3258b8ad-6fc8-4895-95b8-7ed574222e30.png"
]

Conceptual Usage Example (Python)

Below is a conceptual Python code snippet demonstrating how to invoke the Generate Image with One Diffusion action using the provided input structure:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "8ab68e50-1e3e-438b-8e3c-967ba603e185"  # Action ID for Generate Image with One Diffusion

# Construct the input payload based on the action's requirements
payload = {
    "task": "text2image",
    "width": 1024,
    "height": 1024,
    "prompt": "A bipedal black cat wearing a huge oversized witch hat, a wizards robe, casting a spell, in an enchanted forest. The scene is filled with fireflies and moss on surrounding rocks and trees",
    "azimuth": "0",
    "distance": "1.5",
    "elevation": "0",
    "denoiseMask": "0",
    "focalLength": 1.3887,
    "guidanceScale": 4,
    "negativePrompt": "monochrome, greyscale, low-res, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, blurry, amputation",
    "numInferenceSteps": 50,
    "useInputImageSize": False
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this example, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The action ID and input payload are specified, and the script sends a POST request to the hypothetical Cognitive Actions endpoint.

Conclusion

The Generate Image with One Diffusion action provides a robust toolset for developers aiming to incorporate advanced image synthesis capabilities into their applications. By utilizing this action, you can create stunning visuals, enrich user experiences, and explore endless creative possibilities. As you experiment with these Cognitive Actions, consider the various tasks and parameters available to tailor the output to your needs. Happy coding!