Generate Stunning Bird Images with the superhighfives/birds-and-flowers Cognitive Actions

23 Apr 2025
Generate Stunning Bird Images with the superhighfives/birds-and-flowers Cognitive Actions

In the world of artificial intelligence, image generation has taken significant strides, allowing developers to create stunning visuals from textual prompts. The superhighfives/birds-and-flowers spec provides a powerful Cognitive Action that leverages a fine-tuned SDXL model to generate images of birds based on user-defined prompts. By utilizing datasets from the British Library's free archive, this action offers enhanced accuracy and diversity in the images produced.

This blog post will delve into the details of the Generate Bird Image Using SDXL Fine-Tune action, exploring its capabilities and how you can integrate it into your applications.

Prerequisites

Before you get started with the Cognitive Actions, ensure you have the following:

  • An API key for accessing the Cognitive Actions platform.
  • Basic knowledge of making HTTP requests and handling JSON data in your application.
  • Familiarity with Python programming to utilize the conceptual code examples provided.

To authenticate, you will generally pass your API key in the headers of your requests.

Cognitive Actions Overview

Generate Bird Image Using SDXL Fine-Tune

The Generate Bird Image Using SDXL Fine-Tune action allows developers to create vivid images of birds based on descriptive prompts. This action falls under the image-generation category and is designed for creative applications, art generation, and more.

Input

The action accepts a structured input defined by the following schema:

  • mask (string, optional): URI of the input mask for inpaint mode.
  • seed (integer, optional): Seed value for randomness; if omitted, a random seed is used.
  • image (string, optional): URI of the input image for img2img or inpaint processing.
  • width (integer, default: 1024): Width of the output image in pixels.
  • height (integer, default: 1024): Height of the output image in pixels.
  • prompt (string, required): Descriptive text for the image generation.
  • scheduler (string, default: "K_EULER"): Algorithm for scheduling generation steps.
  • refineStyle (string, default: "no_refiner"): Style for image refinement.
  • modelWeights (string, optional): Model weights for generation.
  • guidanceScale (number, default: 7.5): Intensity of classifier-free guidance.
  • applyWatermark (boolean, default: true): Whether to apply a watermark to the images.
  • negativePrompt (string, optional): Elements to minimize in the generated image.
  • promptStrength (number, default: 0.8): Influence of the input prompt on the output.
  • numberOfOutputs (integer, default: 1): Number of images to generate (1-4).
  • refinementSteps (integer, optional): Number of refinement steps for base image refinement.
  • highNoiseFraction (number, default: 0.8): Fraction of noise for the expert ensemble refiner.
  • loraAdjustmentScale (number, default: 0.6): Scale factor for LoRA adjustments.
  • disableSafetyChecker (boolean, default: false): Disables safety checks on generated images.
  • numberOfInferenceSteps (integer, default: 50): Number of steps in the denoising process.
Example Input
{
  "width": 1024,
  "height": 1024,
  "prompt": "An bird flying in front of the sun in the style of TOK",
  "scheduler": "K_EULER",
  "refineStyle": "no_refiner",
  "guidanceScale": 7.5,
  "applyWatermark": true,
  "negativePrompt": "",
  "promptStrength": 0.8,
  "numberOfOutputs": 3,
  "highNoiseFraction": 0.8,
  "loraAdjustmentScale": 0.6,
  "numberOfInferenceSteps": 50
}

Output

Upon successful execution, the action returns a list of URLs pointing to the generated images. Here’s an example of the output you might expect:

[
  "https://assets.cognitiveactions.com/invocations/4dbc45da-b3d6-4625-a090-2c9f6eee47fb/886e1e44-2048-4b06-ac1e-5b3a460f5fe3.png",
  "https://assets.cognitiveactions.com/invocations/4dbc45da-b3d6-4625-a090-2c9f6eee47fb/44f2e85a-d02a-430b-aae1-f5fae16bf2e1.png",
  "https://assets.cognitiveactions.com/invocations/4dbc45da-b3d6-4625-a090-2c9f6eee47fb/6b305728-5ceb-4424-af05-5f4cb9ec67fe.png"
]

Conceptual Usage Example (Python)

Here’s a conceptual Python code snippet illustrating how to call this action using a hypothetical endpoint:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "1395a081-a804-4ebb-a60e-a3663131cee3" # Action ID for Generate Bird Image Using SDXL Fine-Tune

# Construct the input payload based on the action's requirements
payload = {
    "width": 1024,
    "height": 1024,
    "prompt": "An bird flying in front of the sun in the style of TOK",
    "scheduler": "K_EULER",
    "refineStyle": "no_refiner",
    "guidanceScale": 7.5,
    "applyWatermark": True,
    "negativePrompt": "",
    "promptStrength": 0.8,
    "numberOfOutputs": 3,
    "highNoiseFraction": 0.8,
    "loraAdjustmentScale": 0.6,
    "numberOfInferenceSteps": 50
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet, replace the placeholder for the API key and endpoint with your actual credentials. The payload object is structured according to the action's input schema, and the action_id is specified accordingly.

Conclusion

The Generate Bird Image Using SDXL Fine-Tune action from the superhighfives/birds-and-flowers spec provides developers with a robust tool for generating creative bird images from textual prompts. By leveraging this action, you can enhance your applications with unique visual content, whether for artistic purposes or functional implementations.

Next steps could include experimenting with different prompts, tweaking the input parameters for diverse outputs, or integrating this action into larger projects that require dynamic image generation. Happy coding!