Generate Engaging Stories from Images: A Guide to MiniGPT-4 Cognitive Actions

24 Apr 2025
Generate Engaging Stories from Images: A Guide to MiniGPT-4 Cognitive Actions

In the realm of artificial intelligence, the ability to understand and generate human-like text from various inputs is a powerful tool. The MiniGPT-4 from daanelson provides a versatile Cognitive Action that allows developers to leverage image analysis and text generation seamlessly. This action can describe images, craft narratives, and even transform website mockups into HTML content. By utilizing this pre-built Cognitive Action, developers can enrich their applications with advanced capabilities, making them more interactive and engaging.

Prerequisites

Before you begin integrating the MiniGPT-4 Cognitive Action, ensure you have the following:

  • An API key for the Cognitive Actions platform. This is required to authenticate your requests.
  • Familiarity with making HTTP requests and handling JSON data in your programming environment.

Authentication typically involves passing your API key in the request headers, allowing you to securely access the Cognitive Actions.

Cognitive Actions Overview

Generate Text from Image with Prompt

Description: This action leverages MiniGPT-4 to generate text by analyzing an input image along with a specified prompt. It is designed for tasks such as describing images, crafting stories, or generating HTML from mockups.

Category: text-generation

Input

The input schema for this action requires the following fields:

  • image (string, required): URI of the image to be analyzed.
  • prompt (string, required): The text prompt guiding the generation.
  • topP (number, optional): Proportion of probability mass to sample from (default: 0.9).
  • temperature (number, optional): Controls randomness in token generation (default: 1).
  • maximumLength (integer, optional): The total maximum length of input and output combined in tokens (default: 4000).
  • numberOfBeams (integer, optional): Number of beams used in beam search decoding (default: 3).
  • maximumNewTokens (integer, optional): Maximum number of new tokens to be generated (default: 3000).
  • repetitionPenalty (number, optional): Adjusts penalty for word repetition in output (default: 1).

Example Input:

{
  "topP": 0.9,
  "image": "https://replicate.delivery/pbxt/IqG1MbemhULihtfr62URRZbI29XtcPsnOYASrTDQ6u5oSqv9/llama_13b.png",
  "prompt": "This llama's name is Dave. Write me a story about how Dave found his skateboard.",
  "temperature": 1.32,
  "maximumLength": 4000,
  "numberOfBeams": 5,
  "maximumNewTokens": 3000,
  "repetitionPenalty": 1
}

Output

The action typically returns a generated narrative based on the input image and prompt. Here’s an example of what the output might look like:

Example Output:

"Once upon a time, there was a llama named Dave who lived in the city. Dave loved to explore the city on his skateboard, but he had a hard time finding one that fit him. One day, he saw a skateboard in a store window and knew it was meant to be his. He saved up his money and bought the skateboard. From that day on, Dave could be seen cruising around the city on his new skateboard, wearing his sunglasses and having the time of his life."

Conceptual Usage Example (Python)

Here’s how you might call the MiniGPT-4 Cognitive Action using Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "868c74c6-b91f-4eb5-98ca-e56582272801" # Action ID for Generate Text from Image with Prompt

# Construct the input payload based on the action's requirements
payload = {
    "topP": 0.9,
    "image": "https://replicate.delivery/pbxt/IqG1MbemhULihtfr62URRZbI29XtcPsnOYASrTDQ6u5oSqv9/llama_13b.png",
    "prompt": "This llama's name is Dave. Write me a story about how Dave found his skateboard.",
    "temperature": 1.32,
    "maximumLength": 4000,
    "numberOfBeams": 5,
    "maximumNewTokens": 3000,
    "repetitionPenalty": 1
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet, you replace the placeholder with your actual API key and specify the correct endpoint. The action ID and input payload are populated to reflect the requirements of the "Generate Text from Image with Prompt" action.

Conclusion

The MiniGPT-4 Cognitive Action provides a fantastic opportunity for developers to create engaging applications that can interpret images and generate contextual narratives. By implementing this action, you can enhance user experience and open up new possibilities for interactive content generation. Potential use cases include educational tools, storytelling apps, or even automated content creation platforms. Dive in and explore the creative potential that MiniGPT-4 offers!