Effortless Subtitle Generation with Whisper for Your Audio Files

25 Apr 2025
Effortless Subtitle Generation with Whisper for Your Audio Files

In today's digital landscape, the demand for accessible content has never been higher. The "Whisper Downloadable Subtitles" service provides developers with a powerful tool to create downloadable subtitles for audio files using OpenAI's Whisper model. This service not only simplifies the process of generating subtitles but also enhances the accessibility of content for diverse audiences. With support for various formats like .txt and .srt, as well as the capability to translate subtitles into English, this service opens up a wide array of possibilities for content creators and developers alike.

Imagine an educational platform that delivers audio lessons in multiple languages or a video production team that needs accurate subtitles for their international audience. By utilizing the Whisper Downloadable Subtitles service, developers can easily integrate subtitle generation into their applications, ensuring that their content reaches a broader audience while saving time and effort.

Prerequisites

Before diving into the implementation, ensure you have a Cognitive Actions API key and a basic understanding of making API calls.

Generate Downloadable Subtitles with Whisper

The "Generate Downloadable Subtitles with Whisper" action allows you to create subtitles for audio files efficiently, solving the problem of accessibility and language barriers in audio content.

Input Requirements

To utilize this action, you need to provide the following input:

  • audioFileUri: A URI pointing to the audio file that is accessible over the internet (e.g., https://example.com/audiofile.wav).
  • whisperModel: Select the Whisper model you wish to use for processing the audio. Options include "tiny," "base," "small," "medium," and "large," with "base" as the default.
  • subtitleFormat: Choose the format for the generated subtitles. Options include "None," "txt," "srt," or "All," with "None" as the default.
  • translateToEnglish: A boolean flag indicating whether to translate the transcription to English. By default, this is set to false.

Expected Output

Upon successful execution, the action returns:

  • srt_file: A link to the generated .srt subtitle file.
  • txt_file: A link to the generated .txt subtitle file.
  • translation: The translated text in English if translation is enabled.
  • transcription: The original transcription of the audio in its detected language.
  • detected_language: The language detected from the audio.

Use Cases for this Specific Action

  • Educational Platforms: Automatically generate subtitles for audio lessons, making them accessible to non-native speakers and hearing-impaired learners.
  • Video Production: Create accurate subtitles for videos to enhance viewer engagement and comprehension, especially for international audiences.
  • Podcasting: Provide downloadable subtitles for podcast episodes, allowing listeners to follow along or reference content easily.
  • Content Localization: Translate audio content into English, facilitating broader reach and understanding for global audiences.
import requests
import json

# Replace with your actual Cognitive Actions API key and endpoint
# Ensure your environment securely handles the API key
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
# This endpoint URL is hypothetical and should be documented for users
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"

action_id = "66dedb6e-4352-4914-8998-fd4b4290f837" # Action ID for: Generate Downloadable Subtitles with Whisper

# Construct the exact input payload based on the action's requirements
# This example uses the predefined example_input for this action:
payload = {
  "audioFileUri": "https://replicate.delivery/mgxm/f2ee54de-6356-4a7d-82da-78f6057e3ccb/OSR_fr_000_0045_8k.wav",
  "whisperModel": "base",
  "subtitleFormat": "All",
  "translateToEnglish": true
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json",
    # Add any other required headers for the Cognitive Actions API
}

# Prepare the request body for the hypothetical execution endpoint
request_body = {
    "action_id": action_id,
    "inputs": payload
}

print(f"--- Calling Cognitive Action: {action.name or action_id} ---")
print(f"Endpoint: {COGNITIVE_ACTIONS_EXECUTE_URL}")
print(f"Action ID: {action_id}")
print("Payload being sent:")
print(json.dumps(request_body, indent=2))
print("------------------------------------------------")

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json=request_body
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully. Result:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body (non-JSON): {e.response.text}")
    print("------------------------------------------------")

Conclusion

The Whisper Downloadable Subtitles service is a game-changer for developers looking to enhance the accessibility of audio content. By streamlining the subtitle generation process and offering translation capabilities, this service not only saves time but also ensures that content is accessible to a wider audience. As the demand for inclusive content continues to grow, integrating this service into your applications can significantly enhance user experience and engagement. Start exploring the potential of Whisper Downloadable Subtitles today and elevate your content to new heights!