Convert Audio to Text with Holywalley STT BE CTC Cognitive Actions

23 Apr 2025
Convert Audio to Text with Holywalley STT BE CTC Cognitive Actions

In today's digital age, converting speech to text is an essential functionality for many applications, from transcription services to voice-controlled interfaces. The Holywalley STT BE CTC Cognitive Actions offer developers a powerful solution for integrating high-quality speech-to-text capabilities into their applications. By leveraging the NVIDIA STT BE Conformer CTC Large model, you can ensure accurate and efficient transcription of audio files. In this article, we will explore how to use the Perform Speech-to-Text Conversion action provided by this API.

Prerequisites

Before you begin using the Cognitive Actions, ensure you have the following:

  • An API key for the Cognitive Actions platform.
  • Access to the internet to send requests to the API.

The authentication process typically involves passing your API key in the request headers, allowing you to access the Cognitive Actions securely.

Cognitive Actions Overview

Perform Speech-to-Text Conversion

The Perform Speech-to-Text Conversion action converts audio files into text. This action is categorized under speech-to-text services, making it ideal for applications that require accurate transcriptions of spoken content.

Input

To invoke this action, you need to provide the following input:

  • Required Field:
    • audioFileUri: A valid URI pointing to an audio file in a supported format.

Here’s a practical example of the JSON payload required for this action:

{
  "audioFileUri": "https://replicate.delivery/pbxt/KVAyOIfjNxM9M6v3aXPGtRnvI6momTc7bZ6Q0BPsgNiELWpr/pahonia.wav"
}

Output

Upon successful execution, the action returns the transcribed text from the audio file. Here’s an example of the output you can expect:

толькі ў сэрцы трывожным пачую за краіну радзімую жах успомню вострую браму святую і ваякаў на грозных канях у белай пене праносяцца коні рвуцца ммкнуцца і цяжка хрыпяць старадаўняй літоўскай пагоніі не разбіць не спыніць не стрымаць

The output will be a string containing the transcribed text, accurately reflecting the spoken words in the audio file.

Conceptual Usage Example (Python)

Here’s a conceptual Python code snippet demonstrating how to call this Cognitive Action using a hypothetical execution endpoint:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "9621df27-9b03-4a4d-9ce8-8acbe3c69506"  # Action ID for Perform Speech-to-Text Conversion

# Construct the input payload based on the action's requirements
payload = {
    "audioFileUri": "https://replicate.delivery/pbxt/KVAyOIfjNxM9M6v3aXPGtRnvI6momTc7bZ6Q0BPsgNiELWpr/pahonia.wav"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet:

  • Replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key.
  • The payload variable contains the necessary input for the action, specifically the URI of the audio file.
  • The response from the API provides the transcribed text, which is printed in a formatted manner.

Conclusion

The Holywalley STT BE CTC Cognitive Actions provide a seamless way to integrate speech-to-text functionality into your applications. By utilizing the Perform Speech-to-Text Conversion action, developers can ensure high accuracy and quality transcription of audio content. Consider exploring more use cases such as real-time transcription, accessibility features, or enhancing user interactions in voice-controlled applications. Start building innovative solutions with these powerful Cognitive Actions today!