Transcribe Audio Effortlessly with vaibhavs10/incredibly-fast-whisper Actions

In the rapidly evolving world of machine learning and natural language processing, the ability to transcribe audio quickly and accurately is invaluable. The vaibhavs10/incredibly-fast-whisper API offers developers a powerful tool through its Cognitive Actions designed for high-speed audio transcription. Leveraging the capabilities of Whisper Large v3, this API can transcribe 150 minutes of audio in just 100 seconds, making it an ideal solution for applications needing swift and precise speech-to-text conversions.
Prerequisites
Before you can start using the Cognitive Actions, you'll need a few essentials:
- API Key: Ensure you have your API key ready for authentication when making requests to the Cognitive Actions platform.
- Internet Access: Since this API processes audio files hosted online, the ability to access these resources is necessary.
Authentication is typically handled by including the API key in the request headers, allowing you to securely access the service.
Cognitive Actions Overview
Transcribe Audio with Whisper
The Transcribe Audio with Whisper action provides exceptional speed in converting audio files into text. This action is categorized under speech-to-text and is perfect for developers looking to integrate transcription functionalities into their applications.
Input
The input for this action requires the following fields:
- audio: (Required) URI of the audio file to be processed.
- task: (Optional) Specifies the task to perform, either
transcribeortranslate, defaulting totranscribe. - language: (Optional) The spoken language in the audio, defaulting to
Nonefor automatic detection. - batchSize: (Optional) The number of parallel batches to compute, defaulting to 24.
- timestamp: (Optional) Level of detail for timestamps, either
chunkorword, defaulting tochunk. - diariseAudio: (Optional) Boolean indicating whether to use audio diarization, defaulting to
false. - huggingFaceToken: (Optional) Token for enabling audio diarization.
Example input structure:
{
"task": "transcribe",
"audio": "https://replicate.delivery/pbxt/Js2Fgx9MSOCzdTnzHQLJXj7abLp3JLIG3iqdsYXV24tHIdk8/OSR_uk_000_0050_8k.wav",
"batchSize": 64
}
Output
The output from this action typically includes:
- text: The transcribed text from the audio.
- chunks: An array of objects, each containing transcribed text segments along with their respective timestamps.
Example output structure:
{
"text": " the little tales they tell are false the door was barred locked and bolted as well ripe pears are fit...",
"chunks": [
{
"text": " the little tales they tell are false the door was barred locked and bolted as well ripe pears are fit...",
"timestamp": [0, 29.72]
},
{
"text": " with a mild wab. The room was crowded with a wild mob...",
"timestamp": [29.72, 38.98]
},
{
"text": " honour. She blushed when he gave her a white orchid...",
"timestamp": [38.98, 48.52]
}
]
}
Conceptual Usage Example (Python)
Here’s a conceptual example of how you might interact with the Cognitive Actions API to transcribe audio using Python:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "191bcad6-9cca-4c50-a2d6-2d7a3c596c17" # Action ID for Transcribe Audio with Whisper
# Construct the input payload based on the action's requirements
payload = {
"task": "transcribe",
"audio": "https://replicate.delivery/pbxt/Js2Fgx9MSOCzdTnzHQLJXj7abLp3JLIG3iqdsYXV24tHIdk8/OSR_uk_000_0050_8k.wav",
"batchSize": 64
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The action_id corresponds to the "Transcribe Audio with Whisper" action, and the payload is structured according to the required input schema.
Conclusion
The vaibhavs10/incredibly-fast-whisper Cognitive Actions offer a robust solution for developers seeking efficient audio transcription capabilities. With the ability to process audio swiftly and accurately, these actions can enhance various applications, from content creation to accessibility tools. As a next step, consider exploring the integration of this API into your projects, or experiment with different parameters to optimize performance based on your specific use cases. Happy coding!