Create Video Datasets Effortlessly with zsxkib Cognitive Actions

24 Apr 2025
Create Video Datasets Effortlessly with zsxkib Cognitive Actions

In the realm of video processing, the ability to generate datasets with automated features can significantly streamline the workflow for machine learning projects, especially in tasks like finetuning models. The zsxkib/create-video-dataset spec offers a powerful Cognitive Action that enables developers to prepare video datasets complete with automatic captioning. This action simplifies the process of handling video files, whether they're sourced from URLs or local uploads, and organizes them for training purposes.

Prerequisites

Before diving into utilizing the Cognitive Actions, ensure you have an API key for the Cognitive Actions platform. This key will be used to authenticate your requests. Generally, authentication involves passing the API key in the headers of your requests, allowing secure access to the Cognitive Actions functionality.

Cognitive Actions Overview

Create Video Dataset with Auto-Captioning

The Create Video Dataset with Auto-Captioning action allows developers to prepare video datasets enriched with captions generated automatically using the QWEN-VL model. This capability is particularly useful for projects that require high-quality, well-structured video data.

  • Category: video-processing

Input

The following schema defines the parameters you can use to invoke this action:

  • videoUrl (string, required): The URL of the video to process. For example, https://www.youtube.com/watch?v=dQw4w9WgXcQ.
  • videoFile (string, optional): The URI of the local video file to process. If a URL is provided, this parameter is ignored.
  • startTime (number, optional): The start time in seconds for processing (default is 0).
  • endTime (number, optional): The end time in seconds for processing (default is 0, which means until the video ends).
  • autoCaption (boolean, optional): Enables automatic caption generation (default is true).
  • captionStyle (string, optional): Defines the caption style (options include "minimal", "detailed", or "custom"; default is "detailed").
  • customCaption (string, optional): Required if captionStyle is set to "custom" or if autoCaption is false.
  • skipIntro (boolean, optional): If true, skips the first 10 seconds of the video (default is false).
  • numberOfScenes (integer, optional): Number of scenes to extract (default is 4).
  • maximumSceneLength (number, optional): The maximum length of a scene in seconds (default is 10).
  • minimumSceneLength (number, optional): The minimum length of a scene in seconds (default is 1).
  • targetFramesPerSecond (number, optional): The target frame rate for processing (default is 24 fps).
  • triggerWord (string, optional): A specific word included in captions (default is "TOK").
  • autoCaptionPrefix (string, optional): Text to prepend to auto-generated captions.
  • autoCaptionSuffix (string, optional): Text to append to auto-generated captions.
  • previewOnly (boolean, optional): If true, only generates scene previews without creating a full dataset (default is false).
  • detectionMode (string, optional): Scene detection method (default is "content").

Example Input:

{
    "endTime": 40,
    "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "startTime": 10,
    "autoCaption": true,
    "triggerWord": "RICKROLL",
    "autoCaptionPrefix": "a video of RICKROLL, "
}

Output

The action typically returns a downloadable link to a zip file containing the processed video dataset. Here’s a sample output:

[
    "https://assets.cognitiveactions.com/invocations/be90d877-5ff5-4bab-b254-01fd3c602369/d6c567b8-e8b2-427c-85bc-201663e5788c.zip"
]

Conceptual Usage Example (Python)

Here’s a conceptual Python snippet to illustrate how to call this action:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "6e684f70-8ec8-4163-a708-faec0bc790a2"  # Action ID for Create Video Dataset with Auto-Captioning

# Construct the input payload based on the action's requirements
payload = {
    "endTime": 40,
    "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "startTime": 10,
    "autoCaption": True,
    "triggerWord": "RICKROLL",
    "autoCaptionPrefix": "a video of RICKROLL, "
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this snippet, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The action ID and input payload are structured according to the specifications provided.

Conclusion

The Create Video Dataset with Auto-Captioning action from the zsxkib/create-video-dataset spec provides a robust solution for developers looking to enhance their video processing capabilities. By integrating this action into your applications, you can automate the tedious task of generating captions and structuring video datasets, thus accelerating your machine learning projects.

Explore further use cases or consider combining this action with other Cognitive Actions for even more powerful workflows!