Generate Stunning Images with BLIP-Diffusion Cognitive Actions

24 Apr 2025
Generate Stunning Images with BLIP-Diffusion Cognitive Actions

In today's digital landscape, the ability to generate and edit images programmatically has become an essential capability for developers. The BLIP-Diffusion Cognitive Actions empower you to harness advanced AI-driven image generation techniques, allowing for zero-shot subject-driven text-to-image creation and editing. By integrating these actions into your applications, you can enhance creativity and produce stunning visuals with ease. Let's dive into how you can utilize these powerful tools.

Prerequisites

Before you get started, ensure you have the following:

  • An API key for the Cognitive Actions platform.
  • Basic familiarity with making API calls and handling JSON data.

Authentication typically involves passing your API key in the headers of your requests, allowing you to securely access the Cognitive Actions functionalities.

Cognitive Actions Overview

Generate and Edit Images with BLIP-Diffusion

The Generate and Edit Images with BLIP-Diffusion action allows developers to create and modify images based on textual prompts and conditioning images. This action is optimized for speed and offers a high degree of customizability by leveraging ControlNet for enhanced controllability.

Input

The input for this action requires a structured JSON object that includes several fields:

  • conditioningImage (required): A URI pointing to a conditioning image used in the image generation process, typically a canny edge image.
  • styleImage (required): A URI pointing to a reference style image used to condition the image generation.
  • seed (optional): An integer for initializing the random number generator. Leave blank for a random seed.
  • prompt (optional): A string guiding the image generation. Defaults to "on a marble table".
  • guidanceScale (optional): A number that adjusts the influence of the prompt on the image generation (1 to 20). Default is 7.5.
  • controlNetType (optional): Specifies the type of control network to be used ('canny' or 'hed'). Defaults to 'canny'.
  • negativePrompt (optional): A list of undesirable outcomes to avoid (e.g., "over-exposure").
  • numInferenceSteps (optional): Number of denoising steps during image generation (1 to 500). Default is 25.
  • styleSubjectCategory (optional): Defines the category of the subject for styling. Defaults to 'flower'.
  • targetSubjectCategory (optional): Specifies the category of the target subject to be generated. Defaults to 'teapot'.

Example Input:

{
  "seed": 10,
  "prompt": "on a marble table",
  "styleImage": "https://replicate.delivery/pbxt/KNc7eIC5UqBH5GxAUu3i5WWgQooBWg1WQmKlZyFczvaftVw5/flower.jpg",
  "guidanceScale": 7.5,
  "controlNetType": "canny",
  "negativePrompt": "over-exposure, under-exposure, saturated, duplicate, out of frame, lowres, cropped, worst quality, low quality, jpeg artifacts, morbid, mutilated, ugly, bad anatomy, bad proportions, deformed, blurry",
  "conditioningImage": "https://replicate.delivery/pbxt/KNc7dtQmRYksrDn4eC1qGrtlBJIkHHmk4S7OSMT8UAYMHRul/kettle.jpg",
  "numInferenceSteps": 25,
  "styleSubjectCategory": "flower",
  "targetSubjectCategory": "teapot"
}

Output

The action typically returns a URI to the generated image.

Example Output:

https://assets.cognitiveactions.com/invocations/d92648bf-0b08-436d-9789-ac3d79e9b362/d11d7d43-dcd7-490d-891a-3ac12ff574ea.png

Conceptual Usage Example (Python)

Here's a conceptual Python code snippet demonstrating how to invoke the Generate and Edit Images with BLIP-Diffusion action:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "2202dd96-11dc-4365-a1ac-8ac1601cba59"  # Action ID for Generate and Edit Images with BLIP-Diffusion

# Construct the input payload based on the action's requirements
payload = {
    "seed": 10,
    "prompt": "on a marble table",
    "styleImage": "https://replicate.delivery/pbxt/KNc7eIC5UqBH5GxAUu3i5WWgQooBWg1WQmKlZyFczvaftVw5/flower.jpg",
    "guidanceScale": 7.5,
    "controlNetType": "canny",
    "negativePrompt": "over-exposure, under-exposure, saturated, duplicate, out of frame, lowres, cropped, worst quality, low quality, jpeg artifacts, morbid, mutilated, ugly, bad anatomy, bad proportions, deformed, blurry",
    "conditioningImage": "https://replicate.delivery/pbxt/KNc7dtQmRYksrDn4eC1qGrtlBJIkHHmk4S7OSMT8UAYMHRul/kettle.jpg",
    "numInferenceSteps": 25,
    "styleSubjectCategory": "flower",
    "targetSubjectCategory": "teapot"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this example, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The action ID and input payload are structured according to the parameters specified for the BLIP-Diffusion action. The endpoint URL and request structure serve as a template for your integration.

Conclusion

The BLIP-Diffusion Cognitive Actions offer powerful capabilities for generating and editing images tailored to your specifications. By integrating these actions into your applications, you can unlock creativity and produce high-quality visuals efficiently. Whether you're enhancing user interfaces, creating unique content, or automating design tasks, these tools can significantly streamline your development process. Explore these possibilities and take your applications to the next level!