Effortlessly Generate Subtitles from Audio with Whisper

In today's digital landscape, accessibility and engagement are paramount. With the Whisper Subtitles service, developers can leverage advanced AI technology to automatically generate subtitles from audio files, enhancing user experience and making content more accessible to diverse audiences. This service utilizes OpenAI's Whisper model, known for its multilingual speech recognition and translation capabilities, allowing for subtitles in various languages and formats.
Common use cases for this service include adding subtitles to podcasts, videos, webinars, and online courses, or providing translations for international audiences. By automating the subtitle generation process, developers can save significant time and resources while improving content reach and comprehension.
Prerequisites
To get started with Whisper Subtitles, you'll need an API key for the Cognitive Actions service and a basic understanding of making API calls.
Generate Subtitles from Audio
The "Generate Subtitles from Audio" action is designed to transcribe audio files into subtitle formats like SRT or VTT. This action is particularly useful for developers looking to enhance video and audio content with accurate, machine-generated subtitles.
Input Requirements: To use this action, you need to provide:
- audioPath: A valid URI of the audio file you want to transcribe. For example,
https://example.com/audiofile.wav. - format: The subtitle format you prefer, either 'srt' or 'vtt'. The default is 'vtt'.
- modelName: The Whisper model you wish to use for transcription, with options ranging from 'tiny' to 'large'. The default is 'base'.
Expected Output: The action will return:
- The generated subtitles in the specified format, containing time-stamped text.
- The language of the audio, detected automatically.
- The transcribed text from the audio, providing a textual representation of the spoken content.
Use Cases for this specific action:
- Video Content Creation: Automatically generate subtitles for YouTube videos or online courses, improving accessibility for viewers who are deaf or hard of hearing.
- Multilingual Support: Create subtitles in different languages, making your audio content available to a broader audience.
- Podcast Accessibility: Enhance podcast episodes with subtitles, allowing listeners to follow along or reference specific parts of the audio easily.
import requests
import json
# Replace with your actual Cognitive Actions API key and endpoint
# Ensure your environment securely handles the API key
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
# This endpoint URL is hypothetical and should be documented for users
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"
action_id = "0021bbed-029f-4c33-96c9-0b75bcad416e" # Action ID for: Generate Subtitles from Audio
# Construct the exact input payload based on the action's requirements
# This example uses the predefined example_input for this action:
payload = {
"format": "vtt",
"audioPath": "https://replicate.delivery/mgxm/3a4f8158-eb09-4430-9a7d-efc811cd5572/micro-machines.wav",
"modelName": "base"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json",
# Add any other required headers for the Cognitive Actions API
}
# Prepare the request body for the hypothetical execution endpoint
request_body = {
"action_id": action_id,
"inputs": payload
}
print(f"--- Calling Cognitive Action: {action.name or action_id} ---")
print(f"Endpoint: {COGNITIVE_ACTIONS_EXECUTE_URL}")
print(f"Action ID: {action_id}")
print("Payload being sent:")
print(json.dumps(request_body, indent=2))
print("------------------------------------------------")
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json=request_body
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully. Result:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body (non-JSON): {e.response.text}")
print("------------------------------------------------")
Conclusion
The Whisper Subtitles service streamlines the process of generating subtitles from audio, saving developers time while enhancing the accessibility and reach of their content. Whether for educational videos, podcasts, or multilingual productions, this service offers a powerful solution to meet diverse audience needs. Start integrating Whisper Subtitles into your projects today and unlock new possibilities for engaging and accessible content.