Effortless Subtitle Generation with stayallive/whisper-subtitles Cognitive Actions

25 Apr 2025
Effortless Subtitle Generation with stayallive/whisper-subtitles Cognitive Actions

In the world of multimedia content, accessibility and comprehension are paramount. The stayallive/whisper-subtitles API offers developers a powerful tool to generate subtitles from audio files seamlessly. Leveraging OpenAI's Whisper models, this set of Cognitive Actions provides automatic subtitle generation in popular formats like .srt and .vtt. It enables developers to enhance their applications with features like voice activity detection, language selection, and model specification—all designed to improve user experience and engagement.

Prerequisites

Before diving into the integration of Cognitive Actions, ensure you have the following:

  • An API key for the Cognitive Actions platform.
  • Basic knowledge of making API calls and handling JSON data.
  • Familiarity with Python programming is beneficial for implementing the provided code examples.

To authenticate your requests, you will typically pass the API key in the request headers, allowing secure access to the Cognitive Actions services.

Cognitive Actions Overview

Generate Subtitles from Audio

Description: This action generates subtitles in .srt and .vtt formats from audio files using OpenAI's Whisper models. It allows you to specify options such as voice activity detection, language, and model choice.

Category: automatic-subtitle-generation

Input

The input schema for this action requires the following fields:

  • audioPath (required): The URI of the audio file for which subtitles are to be generated.
  • language (optional): Specifies the language of the audio. Defaults to English ('en').
  • modelName (optional): Specifies the Whisper model to use for subtitle generation. Defaults to 'small'.
  • voiceActivityDetectionFilter (optional): Indicates whether to enable voice activity detection (VAD) to filter out non-speech sections of the audio. Defaults to true.

Example Input:

{
  "language": "en",
  "audioPath": "https://replicate.delivery/pbxt/IrETbKtxjksIYsBNRXynBGKpMxCYQzvSjsgsk3XMqp9NkvWc/preamble.wav",
  "modelName": "small.en"
}

Output

Upon successful execution, the action returns the following output:

  • preview: A brief preview of the generated subtitles.
  • srt_file: A link to the generated .srt file.
  • vtt_file: A link to the generated .vtt file.

Example Output:

{
  "preview": "We, the people of the United States, in order to form a more perfect union...",
  "srt_file": "https://assets.cognitiveactions.com/invocations/fa01f3ae-0a38-4db7-bea6-0a529058788a/417e2827-5513-40a9-bf03-0cad57a39797.srt",
  "vtt_file": "https://assets.cognitiveactions.com/invocations/fa01f3ae-0a38-4db7-bea6-0a529058788a/7d132e73-09d5-4ff0-b3ac-1cb3a15c2135.vtt"
}

Conceptual Usage Example (Python)

Here’s how you can invoke the "Generate Subtitles from Audio" action using Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "6b4c1ce0-b2b4-4f0d-ab36-3ea15d59ff2c" # Action ID for Generate Subtitles from Audio

# Construct the input payload based on the action's requirements
payload = {
    "language": "en",
    "audioPath": "https://replicate.delivery/pbxt/IrETbKtxjksIYsBNRXynBGKpMxCYQzvSjsgsk3XMqp9NkvWc/preamble.wav",
    "modelName": "small.en"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The action_id corresponds to the "Generate Subtitles from Audio" action, and the payload is structured based on the required input. The endpoint URL and request structure are illustrative; adjust them according to your actual API specifications.

Conclusion

By integrating the stayallive/whisper-subtitles Cognitive Actions into your applications, you can effortlessly generate accurate subtitles from audio files, significantly enhancing user accessibility and engagement. Whether you're building a video platform, educational tool, or any multimedia application, these actions provide a straightforward solution for automatic subtitle generation. Start exploring the possibilities today and elevate your content's accessibility!