Transform Text into Audio with smaerdlatigid/stable-audio Cognitive Actions

22 Apr 2025
Transform Text into Audio with smaerdlatigid/stable-audio Cognitive Actions

In today's digital landscape, transforming text into high-quality audio can enhance user engagement and accessibility. The smaerdlatigid/stable-audio spec provides a powerful Cognitive Action that allows developers to convert textual descriptions into audio clips. This pre-built action simplifies the integration of text-to-speech capabilities into your applications, enabling a seamless way to deliver audio content.

Prerequisites

Before you dive into using the Cognitive Actions, ensure you have the following ready:

  • An API key for the Cognitive Actions platform, which you’ll use to authenticate your requests.
  • Basic knowledge of making HTTP requests, especially in Python, as we will provide a conceptual code example.

Authentication typically involves passing your API key in the request headers, allowing access to the Cognitive Actions services securely.

Cognitive Actions Overview

Generate Audio From Text

Description: This action transforms textual descriptions into audio clips using a stable audio model. Developers can adjust parameters such as processing steps, audio duration, and guiding strength for customized audio output.

  • Category: Text-to-Speech

Input

The input for this action is structured as follows:

{
  "modelSteps": 120,
  "imagePrompt": "A gentle rainfall with distant thunder",
  "configurationValue": 7,
  "totalDurationSeconds": 60
}
  • modelSteps (integer, default: 120): The number of processing steps the model should perform.
  • imagePrompt (string, default: "A gentle rainfall with distant thunder"): The text description of the desired audio output.
  • configurationValue (number, default: 7): The guiding strength for the model, which influences the fidelity of the generated audio.
  • totalDurationSeconds (integer, default: 60): The total duration in seconds for the model to run.

Output

The action returns a URL pointing to the generated audio file. Here’s an example of the output you can expect:

[
  "https://assets.cognitiveactions.com/invocations/1a5d2d7f-9dd6-4488-bf53-2bd5c07029e1/57988a64-7329-4cf1-a3ce-0ca37a5d78a4.wav"
]

This URL can be used to access the generated audio clip directly.

Conceptual Usage Example (Python)

Here’s a conceptual Python code snippet demonstrating how to call this action:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "37df06a7-04f2-4784-9711-e5c14b2ef881" # Action ID for Generate Audio From Text

# Construct the input payload based on the action's requirements
payload = {
    "modelSteps": 120,
    "imagePrompt": "A gentle rainfall with distant thunder",
    "configurationValue": 7,
    "totalDurationSeconds": 60
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this snippet:

  • Update COGNITIVE_ACTIONS_API_KEY with your actual API key.
  • The payload is constructed based on the action’s requirements.
  • The action_id is set to the ID for the "Generate Audio From Text" action.
  • The request sends the action ID and the payload to the Cognitive Actions endpoint.

Conclusion

The smaerdlatigid/stable-audio Cognitive Actions offer a straightforward way to convert text into engaging audio content. By utilizing the "Generate Audio From Text" action, developers can enhance their applications with unique audio experiences, making them more accessible and enjoyable for users.

As you explore this action, consider how it can fit into your next project, whether for creating personalized audio guides, enhancing storytelling applications, or improving accessibility features. Happy coding!