Streamline Your Audio Dataset Preparation with the Whisper Train Preprocessor

In the world of machine learning and audio processing, preparing your dataset efficiently is crucial for training robust models. The daanelson/whisper-train-preprocessor offers a powerful Cognitive Action designed to preprocess datasets specifically for fine-tuning the Whisper model. This action streamlines the preparation of audio and transcription data, ensuring you can harness Whisper's full potential in language modeling.
Prerequisites
Before getting started with the Whisper Train Preprocessor, ensure you have:
- An API key for the Cognitive Actions platform.
- A basic understanding of JSON and how to work with RESTful APIs.
- URLs to your audio and transcription files, formatted appropriately based on the action's requirements.
Authentication typically involves passing your API key in the request headers, allowing secure access to the Cognitive Actions.
Cognitive Actions Overview
Run Whisper Dataset Preprocessing
Description: This action executes preprocessing for dataset creation, enabling fine-tuning of the Whisper model. It accepts either tarballs of paired audio and transcription text files or a JSONL file containing URLs to audio and their respective transcriptions. This operation prepares data to optimize Whisper's language model capabilities.
Category: audio-processing
Input
The input for this action can vary based on the source of your data. Below is the schema along with an example:
- audioFiles: URL to a tarball file containing a list of audio files. (Optional)
- textFiles: URL to a tarball file containing a list of text transcriptions. (Optional)
- jsonData: URL to a JSONL file containing a list of objects with 'audio' and 'sentence' keys, representing the audio URL and its transcription. (Required)
Example Input:
{
"jsonData": "https://replicate.delivery/pbxt/J9l68gak580pnNO4rMjuhnq9tXG757tVlSkgPveEArFWJbhV/parsed.txt"
}
Output
Upon successful execution, the action typically returns a URL pointing to the location of the processed dataset. Here's an example of the expected output:
Example Output:
https://assets.cognitiveactions.com/invocations/e263beee-06df-482f-919f-e48585c1e89b/585efe05-58b7-481d-98e2-ce7a42e55f8d.gz
Conceptual Usage Example (Python)
Here’s how you might call the Whisper Dataset Preprocessing action using Python:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "89f16c28-e657-4821-ad54-9bbde20e5dc8" # Action ID for Run Whisper Dataset Preprocessing
# Construct the input payload based on the action's requirements
payload = {
"jsonData": "https://replicate.delivery/pbxt/J9l68gak580pnNO4rMjuhnq9tXG757tVlSkgPveEArFWJbhV/parsed.txt"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The action_id indicates the specific action you wish to execute. The payload is structured according to the required input schema, ensuring proper data preparation.
Conclusion
The Run Whisper Dataset Preprocessing action is a powerful tool for developers looking to optimize their audio datasets for machine learning applications. By utilizing this Cognitive Action, you can simplify the preprocessing step, enhancing the efficiency of your workflow and enabling more effective fine-tuning of the Whisper model.
Ready to dive deeper? Experiment with different audio and transcription datasets, and explore the possibilities of Whisper's capabilities in your projects!