Create Video Datasets with Auto-Captioning Using zsxkib Actions

In the realm of video processing, creating datasets that are both comprehensive and easy to navigate is crucial. The zsxkib/create-video-dataset API offers a powerful Cognitive Action for developers looking to automate the creation of video datasets with auto-captioning capabilities. This action simplifies the process of generating high-quality captions for video content, making it an essential tool for developers involved in video analysis and machine learning. In this article, we will dive into how to effectively integrate this action into your applications.
Prerequisites
Before you can start using the Cognitive Actions from the zsxkib API, ensure you have the following:
- An API key for the Cognitive Actions platform to authenticate your requests.
- A working understanding of JSON and HTTP requests, as you'll be constructing payloads to interact with the API.
To authenticate your requests, typically, you would include your API key in the headers of your HTTP requests.
Cognitive Actions Overview
Create Video Dataset with Auto-Captioning
This action allows you to easily create video datasets with automatic captioning using QWEN-VL for Hunyuan-Video LoRA finetuning. It processes videos from URLs or local files, generating high-quality captions and packaging everything into a training-ready format.
Input
The input schema for this action consists of several fields, allowing flexibility in processing your videos:
- videoUrl (string, required): The URL of the video to process (e.g., a YouTube link).
- videoFile (string, optional): URI for the video file to be processed. Ignored if a video URL is provided.
- startTime (number, optional): Start time in seconds for video processing (default is 0).
- endTime (number, optional): End time in seconds for video processing. Leave empty to process until startTime + duration.
- duration (number, optional): Duration in seconds to process the video (default is 30).
- autoCaption (boolean, optional): Automatically generate captions using AI (default is true).
- customCaption (string, optional): Input your own custom caption if
autoCaptionis set to false. - captionPrompt (string, optional): Custom prompt for AI captioning (default focuses on main actions and visuals).
- triggerWord (string, optional): Designated trigger word to be included in the captions (default is "TOK").
- autoCaptionPrefix (string, optional): Text to prefix to each auto-generated caption.
- autoCaptionSuffix (string, optional): Text to suffix to each auto-generated caption.
- numberOfSegments (integer, optional): Number of segments to split the video into (default is 4).
Example Input:
{
"endTime": 40,
"duration": 10,
"videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"startTime": 10,
"autoCaption": true,
"triggerWord": "RICKROLL",
"captionPrompt": "Describe this video clip briefly, focusing on the main action and visual elements.",
"numberOfSegments": 3,
"autoCaptionPrefix": "a video of RICKROLL, "
}
Output
The action typically returns a URL pointing to a ZIP file containing the processed video dataset with captions.
Example Output:
https://assets.cognitiveactions.com/invocations/b26ea903-4d1f-43e7-ada7-1fbe10448b93/44a90a23-4f56-4551-952d-be996ce54340.zip
Conceptual Usage Example (Python)
Here's a conceptual example of how you might call the Create Video Dataset with Auto-Captioning action using Python:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "7c4d4e7a-8d24-40a4-a235-4c224e184f91" # Action ID for Create Video Dataset with Auto-Captioning
# Construct the input payload based on the action's requirements
payload = {
"endTime": 40,
"duration": 10,
"videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"startTime": 10,
"autoCaption": True,
"triggerWord": "RICKROLL",
"captionPrompt": "Describe this video clip briefly, focusing on the main action and visual elements.",
"numberOfSegments": 3,
"autoCaptionPrefix": "a video of RICKROLL, "
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this example, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The payload variable is where you define the specifics of your video processing request, including the video URL, time settings, and captioning preferences.
Conclusion
The Create Video Dataset with Auto-Captioning action from the zsxkib API is a powerful tool for developers looking to streamline video dataset creation. By automating the captioning process, developers can focus on building and fine-tuning their machine learning models without getting bogged down in manual dataset preparation. As you explore this action further, consider how it can integrate into your existing workflows and enhance your applications’ capabilities. Happy coding!