Transform Text into Audio with smaerdlatigid/stable-audio Cognitive Actions

In today's digital landscape, transforming text into high-quality audio can enhance user engagement and accessibility. The smaerdlatigid/stable-audio spec provides a powerful Cognitive Action that allows developers to convert textual descriptions into audio clips. This pre-built action simplifies the integration of text-to-speech capabilities into your applications, enabling a seamless way to deliver audio content.
Prerequisites
Before you dive into using the Cognitive Actions, ensure you have the following ready:
- An API key for the Cognitive Actions platform, which you’ll use to authenticate your requests.
- Basic knowledge of making HTTP requests, especially in Python, as we will provide a conceptual code example.
Authentication typically involves passing your API key in the request headers, allowing access to the Cognitive Actions services securely.
Cognitive Actions Overview
Generate Audio From Text
Description: This action transforms textual descriptions into audio clips using a stable audio model. Developers can adjust parameters such as processing steps, audio duration, and guiding strength for customized audio output.
- Category: Text-to-Speech
Input
The input for this action is structured as follows:
{
"modelSteps": 120,
"imagePrompt": "A gentle rainfall with distant thunder",
"configurationValue": 7,
"totalDurationSeconds": 60
}
- modelSteps (integer, default: 120): The number of processing steps the model should perform.
- imagePrompt (string, default: "A gentle rainfall with distant thunder"): The text description of the desired audio output.
- configurationValue (number, default: 7): The guiding strength for the model, which influences the fidelity of the generated audio.
- totalDurationSeconds (integer, default: 60): The total duration in seconds for the model to run.
Output
The action returns a URL pointing to the generated audio file. Here’s an example of the output you can expect:
[
"https://assets.cognitiveactions.com/invocations/1a5d2d7f-9dd6-4488-bf53-2bd5c07029e1/57988a64-7329-4cf1-a3ce-0ca37a5d78a4.wav"
]
This URL can be used to access the generated audio clip directly.
Conceptual Usage Example (Python)
Here’s a conceptual Python code snippet demonstrating how to call this action:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "37df06a7-04f2-4784-9711-e5c14b2ef881" # Action ID for Generate Audio From Text
# Construct the input payload based on the action's requirements
payload = {
"modelSteps": 120,
"imagePrompt": "A gentle rainfall with distant thunder",
"configurationValue": 7,
"totalDurationSeconds": 60
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this snippet:
- Update
COGNITIVE_ACTIONS_API_KEYwith your actual API key. - The
payloadis constructed based on the action’s requirements. - The
action_idis set to the ID for the "Generate Audio From Text" action. - The request sends the action ID and the payload to the Cognitive Actions endpoint.
Conclusion
The smaerdlatigid/stable-audio Cognitive Actions offer a straightforward way to convert text into engaging audio content. By utilizing the "Generate Audio From Text" action, developers can enhance their applications with unique audio experiences, making them more accessible and enjoyable for users.
As you explore this action, consider how it can fit into your next project, whether for creating personalized audio guides, enhancing storytelling applications, or improving accessibility features. Happy coding!