Create Video Datasets Effortlessly with zsxkib Cognitive Actions

24 Apr 2025
Create Video Datasets Effortlessly with zsxkib Cognitive Actions

In today's digital landscape, the ability to create and manage video datasets is crucial, especially for training advanced AI models. The zsxkib/create-video-dataset specification offers a powerful Cognitive Action designed to streamline this process. This action allows developers to prepare video datasets for Hunyuan-Video LoRA fine-tuning by processing videos from various sources, generating high-quality captions, and packaging everything neatly for AI training.

This blog post will guide you through the capabilities of this action, including how to integrate it into your applications.

Prerequisites

Before you start using the Cognitive Actions, ensure you have:

  • An API key for the Cognitive Actions platform.
  • Familiarity with making HTTP requests.
  • A development environment set up with Python and the requests library.

Authentication typically involves passing your API key in the request headers to access the Cognitive Actions service.

Cognitive Actions Overview

Create Video Dataset with Auto-Captioning

The Create Video Dataset with Auto-Captioning action simplifies the process of preparing video datasets by automatically generating captions. This action is particularly useful for developers looking to create comprehensive datasets for training AI models.

Input

The input for this action is a composite request that includes various fields for customization:

  • videoUrl (string): The URL of the video to process.
  • videoFile (string): A local video file to process (optional if URL is provided).
  • startTime (number): Start time in seconds for processing (default is 0).
  • endTime (number): End time in seconds for processing (default is 0).
  • quality (string): Select from "fast", "balanced", or "high" (default is "balanced").
  • captionStyle (string): Choose from "minimal", "detailed", or "custom" (default is "detailed").
  • customCaption (string): A custom caption if using a custom style.
  • automaticCaptioning (boolean): Enable AI-generated captions (default is true).
  • triggerPhrase (string): A phrase to prepend to captions (default is "TOK").
  • numberOfScenes (integer): Specify the number of scenes to extract (default is 4).
  • previewOnly (boolean): Generate scene previews without the full dataset (default is false).
  • detectionMode (string): Scene detection method (default is "content").
  • maximumSceneLength (number): Maximum length of a scene in seconds (default is 10).
  • minimumSceneLength (number): Minimum length of a scene in seconds (default is 1).
  • skipIntroduction (boolean): Skip the first 10 seconds of the video (default is false).
  • targetFramesPerSecond (number): Specify the target frame rate (default is 24).
  • automaticCaptionPrefix (string): Text to prepend to auto-generated captions.
  • automaticCaptionSuffix (string): Text to append to auto-generated captions.

Here's a practical example of the input JSON payload:

{
  "endTime": 40,
  "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
  "startTime": 10,
  "triggerPhrase": "RICKROLL",
  "automaticCaptioning": true,
  "automaticCaptionPrefix": "a video of RICKROLL, "
}

Output

Upon successful execution, the action returns a link to a zip file containing the processed video dataset. An example output might look like this:

[
  "https://assets.cognitiveactions.com/invocations/b1d75c52-096d-49f5-bfc9-7f76d238baf0/95a63e7e-84e7-4985-bdbc-eb5713f2236d.zip"
]

Conceptual Usage Example (Python)

Here's how you can call the Create Video Dataset with Auto-Captioning action using Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "af025122-6deb-48ee-9b6e-6d1c6cdba69c" # Action ID for Create Video Dataset with Auto-Captioning

# Construct the input payload based on the action's requirements
payload = {
    "endTime": 40,
    "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "startTime": 10,
    "triggerPhrase": "RICKROLL",
    "automaticCaptioning": True,
    "automaticCaptionPrefix": "a video of RICKROLL, "
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet, replace "YOUR_COGNITIVE_ACTIONS_API_KEY" with your actual API key. The payload should be structured according to the required fields of the action. The code handles the request and prints out the result or any errors that occur.

Conclusion

The Create Video Dataset with Auto-Captioning action from the zsxkib specification enhances the ability to generate and manage video datasets efficiently, making it an invaluable tool for developers working with AI training. By integrating this action into your applications, you streamline the process of preparing high-quality video datasets, allowing you to focus more on innovation and less on manual data preparation.

Explore how you can leverage this action in your projects, and consider how it could fit into your broader AI development initiatives!