Classify YouTube Thumbnails Easily with LLAVA-13B Spotter Creator

22 Apr 2025
Classify YouTube Thumbnails Easily with LLAVA-13B Spotter Creator

Integrating machine learning capabilities into your applications can significantly enhance user experience and functionality. The LLAVA-13B Spotter Creator offers a powerful set of Cognitive Actions designed for analyzing and classifying YouTube thumbnails. By utilizing the fine-tuned LLAVA-13B-Vicuna model, developers can gain insights into thumbnail composition, color schemes, and emotional cues from creators, all while saving time and effort with pre-built actions.

Prerequisites

Before you start integrating the LLAVA-13B Spotter Creator Cognitive Actions, ensure you have the following:

  • An API key for the Cognitive Actions platform, which will be used to authenticate your requests.
  • Familiarity with making API calls, particularly using JSON format for sending and receiving data.

Authentication typically involves passing the API key in the headers of your requests, ensuring secure access to the Cognitive Actions services.

Cognitive Actions Overview

Classify YouTube Thumbnails

The Classify YouTube Thumbnails action leverages the LLAVA-13B-Vicuna model to analyze YouTube thumbnails, categorizing them based on their composition and emotional tone. This action is particularly useful for content creators looking to optimize their thumbnails for better engagement.

Input

The input for this action requires the following fields:

  • image (required): A URI link to the input image.
  • prompt (required): A detailed directive for analyzing the image’s composition.
  • maxTokens (optional): Defines the upper limit for tokens generated in the response (default is 1024).
  • temperature (optional): Controls the diversity of the output (default is 0.2).
  • topPercentage (optional): Specifies the cumulative probability threshold for token sampling (default is 1).

Here’s an example of the JSON payload required to invoke this action:

{
  "image": "https://replicate.delivery/pbxt/LQuGwyreQl8fgdynNHDqVpJeEgIQ8R3megfqRxIjSG9I4gWl/W4DnuQOtA8E.jpg",
  "prompt": "Analyze this YouTube thumbnail in detail and classify it based on the composition. Be concise and avoid jumping to conclusions.\n\n        Categories:\n        - Centralized: Main subject is prominently placed in the center.\n        - Wide Shot: Shows a broad scene with background and/or multiple subjects.\n        - Close-up: Zoomed in on a single subject or detail.\n        - Split Screen: Image is divided into sections showing different scenes or elements.\n        - Text Overlay: Large text or titles are layered over the image.\n        - Miscellaneous: Doesn't fit any of the above categories.\n\n        Think Step-by-Step:\n        - Identify the main focus: What is the most prominent element in the thumbnail?\n        - Consider the perspective: Is it a close view or a broad scene?\n        - Look for divisions or text: Is the image split or overlaid with text?\n\n        Question: Based on your analysis, what is the composition style of the thumbnail?",
  "maxTokens": 1024,
  "temperature": 0.2,
  "topPercentage": 1
}

Output

The output of this action is an array that represents the classification result. For instance, the output could look like this:

[
  "Split ",
  "Screen"
]

This indicates that the model has identified the thumbnail as a "Split Screen" type.

Conceptual Usage Example (Python)

To call the Classify YouTube Thumbnails action, you can use the following Python code snippet:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "6bdb52a6-29ab-471d-944f-a00ed56bc45a" # Action ID for Classify YouTube Thumbnails

# Construct the input payload based on the action's requirements
payload = {
    "image": "https://replicate.delivery/pbxt/LQuGwyreQl8fgdynNHDqVpJeEgIQ8R3megfqRxIjSG9I4gWl/W4DnuQOtA8E.jpg",
    "prompt": "Analyze this YouTube thumbnail in detail and classify it based on the composition. Be concise and avoid jumping to conclusions.\n\n        Categories:\n        - Centralized: Main subject is prominently placed in the center.\n        - Wide Shot: Shows a broad scene with background and/or multiple subjects.\n        - Close-up: Zoomed in on a single subject or detail.\n        - Split Screen: Image is divided into sections showing different scenes or elements.\n        - Text Overlay: Large text or titles are layered over the image.\n        - Miscellaneous: Doesn't fit any of the above categories.\n\n        Think Step-by-Step:\n        - Identify the main focus: What is the most prominent element in the thumbnail?\n        - Consider the perspective: Is it a close view or a broad scene?\n        - Look for divisions or text: Is the image split or overlaid with text?\n\n        Question: Based on your analysis, what is the composition style of the thumbnail?",
    "maxTokens": 1024,
    "temperature": 0.2,
    "topPercentage": 1
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet, replace the placeholder values with your own API key and endpoint. The action ID and the structured input payload are specified to ensure that the request is formatted correctly. The code handles potential errors gracefully and prints the results of the classification.

Conclusion

Integrating the LLAVA-13B Spotter Creator Cognitive Action into your application allows you to automate the analysis of YouTube thumbnails efficiently. By utilizing the powerful classification capabilities of this action, developers can enhance content creation, optimize viewer engagement, and streamline workflows. Explore further possibilities with this action and consider how it can fit into your next project!