Generate Image Prompts with the CLIP Interrogator Actions

23 Apr 2025
Generate Image Prompts with the CLIP Interrogator Actions

The lucataco/clip-interrogator API offers powerful Cognitive Actions designed to enhance the creative process of generating image prompts. By leveraging advanced models like OpenAI’s CLIP and Salesforce’s BLIP, this service synthesizes optimized text prompts that align closely with input images, making it an invaluable tool for developers looking to integrate text-to-image functionalities into their applications. With fast inference capabilities, you can quickly generate effective prompts suited for models like Stable Diffusion, thus empowering your creative projects.

Prerequisites

To get started with the Cognitive Actions from the CLIP Interrogator, you will need:

  • An API key for the Cognitive Actions platform. This key is essential for authentication and should be passed in the request headers.
  • Basic knowledge of RESTful API interactions and JSON formatting will be beneficial.

When making API calls, you typically include your API key in the request headers as follows:

Authorization: Bearer YOUR_COGNITIVE_ACTIONS_API_KEY
Content-Type: application/json

Cognitive Actions Overview

Generate Image Prompts Using CLIP Interrogator

The Generate Image Prompts Using CLIP Interrogator action allows you to synthesize optimized text prompts from a given image. This action is categorized under text generation, and it is particularly useful for improving the creative output of text-to-image models.

Input

The input for this action requires the following fields:

  • image (required): The URI of the input image that needs processing.
  • mode (optional): A string that specifies the processing mode. It can take on values:
    • best: Takes 10-20 seconds for the highest quality.
    • classic: A balanced option in terms of quality and speed.
    • fast: Takes 1-2 seconds for a quicker response.
    • negative: Used for style adjustment.
  • clipModelName (optional): Specifies the CLIP model to use. The options include:
    • ViT-L-14/openai
    • ViT-H-14/laion2b_s32b_b79k
    • ViT-bigG-14/laion2b_s39b_b160k

Example Input:

{
  "mode": "fast",
  "image": "https://replicate.delivery/pbxt/JVpnDt9nXuAnqBaXFPH8JbLrkU7JxQIoAGrHFwRWnFYqI7Ad/replicate-prediction-lyehbrdbrdztdi7ggx63lmhkgm.png",
  "clipModelName": "ViT-bigG-14/laion2b_s39b_b160k"
}

Output

Upon successful execution, the action typically returns a text prompt that describes the image in detail. An example of the expected output is:

"painting of a turtle swimming in the ocean with a blue sky in the background, illustrative art, turtle, michael angelo inspired, world-bearing turtle, highly detailed illustration.”, 4k artwork, realistic illustration, highly detailed digital painting, vibrant digital painting, [ 4 k digital art, 4k art, hyperrealistic illustration, high detail illustration, vibrant realistic

Conceptual Usage Example (Python)

Here’s how you could conceptually implement this action using Python. The code snippet shows how to construct the API request for executing the action:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "15a27263-0837-4fd0-abda-e4a4ff0cf13f"  # Action ID for Generate Image Prompts Using CLIP Interrogator

# Construct the input payload based on the action's requirements
payload = {
    "mode": "fast",
    "image": "https://replicate.delivery/pbxt/JVpnDt9nXuAnqBaXFPH8JbLrkU7JxQIoAGrHFwRWnFYqI7Ad/replicate-prediction-lyehbrdbrdztdi7ggx63lmhkgm.png",
    "clipModelName": "ViT-bigG-14/laion2b_s39b_b160k"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet, you replace the API key and action ID accordingly. The input payload is structured to match the requirements of the action, and the call to the hypothetical endpoint executes the action. The response is printed in a readable JSON format.

Conclusion

The CLIP Interrogator's Generate Image Prompts action empowers developers to create rich, detailed prompts from images, enhancing creativity and efficiency in text-to-image applications. By integrating this action, you can streamline your creative workflows and produce high-quality outputs in a matter of seconds. Consider exploring further actions and capabilities of the CLIP Interrogator to elevate your projects even more!