Effortlessly Convert Audio to SRT with douwantech/faster-whisper Cognitive Actions

22 Apr 2025
Effortlessly Convert Audio to SRT with douwantech/faster-whisper Cognitive Actions

In today’s digital landscape, converting audio files into readable formats is essential for accessibility and content creation. The douwantech/faster-whisper API offers a powerful Cognitive Action that facilitates this process by converting audio files into SRT (SubRip Subtitle) format. This pre-built action streamlines subtitle creation for audio content, making it easier for developers to integrate audio transcription capabilities into their applications.

Prerequisites

Before getting started with the Cognitive Actions, ensure you have the following:

  • An API key for the Cognitive Actions platform.
  • Access to a publicly available audio file in a supported format (e.g., MP3).

Conceptually, authentication is typically done by passing your API key in the request headers.

Cognitive Actions Overview

Convert Audio to SRT

Purpose:
The "Convert Audio to SRT" action transforms audio files into SRT subtitle format, making it easy to create subtitles for audio content.

Category:
Audio Transcription

Input

The input required for this action is defined in the schema below:

  • audioFile (required): A URI pointing to the audio input file. The URI must be publicly accessible and in a proper format (e.g., https://example.com/file.mp3).

Example Input:

{
  "audioFile": "https://replicate.delivery/pbxt/L9QEjk0XBMhzAvMewy8cjuOACT8asXTvUY2JsayxNY5WEP8q/audio.mp3"
}

Output

Upon successful execution, this action typically returns a URI to the generated SRT file.

Example Output:

https://assets.cognitiveactions.com/invocations/60f20a3e-2f25-4304-be65-3388bc12703c/2a140eb7-a199-4eea-be18-06663448d845.srt

Conceptual Usage Example (Python)

The following Python code snippet demonstrates how a developer might invoke the "Convert Audio to SRT" action:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "18d906f8-a6b2-4cce-a748-97ec8e7beab2"  # Action ID for Convert Audio to SRT

# Construct the input payload based on the action's requirements
payload = {
    "audioFile": "https://replicate.delivery/pbxt/L9QEjk0XBMhzAvMewy8cjuOACT8asXTvUY2JsayxNY5WEP8q/audio.mp3"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code:

  • Replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key.
  • The action_id variable is set to the ID of the "Convert Audio to SRT" action.
  • The payload is structured according to the specified input requirements, allowing you to send the audio file URI.
  • The output of the completed action will be printed to the console.

Conclusion

The "Convert Audio to SRT" Cognitive Action from the douwantech/faster-whisper API simplifies the process of creating subtitles for audio files. By leveraging this action, developers can enhance accessibility and content engagement in their applications effortlessly. Consider exploring additional use cases such as integrating this action into video processing workflows or providing enhanced accessibility features for your users. Dive into the world of audio transcription and elevate your applications today!