Transform Text to Speech with ttsds/parlertts_mini_expresso Actions

In the digital age, the ability to convert text into natural-sounding speech is revolutionizing how applications interact with users. The Parlertts Mini Expresso model, part of the ttsds/parlertts_mini_expresso specification, offers developers a powerful tool to perform text-to-speech (TTS) predictions. By utilizing pre-built Cognitive Actions, you can seamlessly integrate TTS capabilities into your applications, enhancing user experience through voice interaction.
Prerequisites
Before you get started, ensure you have the following:
- An API key for the Cognitive Actions platform, which you will use for authentication.
- Basic knowledge of JSON format and how to structure API requests.
Authentication generally involves passing your API key in the request headers, allowing you to securely access the Cognitive Actions services.
Cognitive Actions Overview
Execute Text-to-Speech Prediction
The Execute Text-to-Speech Prediction action is designed to convert text input into audio output using the Parlertts Mini Expresso model. This action allows for additional enhancements like speaker context, making the generated speech sound more natural and tailored.
- Category: text-to-speech
Input
The input for this action is structured as follows:
{
"text": "Your text here",
"prompt": "Optional prompt for context",
"textReference": "Optional secondary text for context",
"speakerReference": "URI to reference audio file"
}
- Required:
text(string): The primary input string that needs to be converted to speech. For example:"With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good."
- Optional:
prompt(string): Additional context or instructions related to thetext. Defaults to an empty string.textReference(string): A secondary input to complement the main text. Example:"and keeping eternity before the eyes, though much."speakerReference(string, URI): A URI pointing to a reference audio file for speaker or voice context. Example:"https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
Output
Upon successfully executing the action, you will receive a URL linking to the generated audio file. An example output looks like this:
"https://assets.cognitiveactions.com/invocations/43f48776-94c0-4f5a-871c-f0230688be86/4856ce2e-7959-4467-a736-5536b38a8d4b.wav"
This URL will point to the audio file created from your text input.
Conceptual Usage Example (Python)
Below is a conceptual Python code snippet demonstrating how to call the Execute Text-to-Speech Prediction action. This example focuses on structuring the input JSON payload correctly.
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "f1babd35-0ebf-4fef-bfe2-1c78c60dc2d8" # Action ID for Execute Text-to-Speech Prediction
# Construct the input payload based on the action's requirements
payload = {
"text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
"prompt": "",
"textReference": "and keeping eternity before the eyes, though much.",
"speakerReference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet, you'll need to replace "YOUR_COGNITIVE_ACTIONS_API_KEY" with your actual API key. The payload variable is constructed based on the required and optional fields defined in the action's input schema.
Conclusion
The Execute Text-to-Speech Prediction action from the ttsds/parlertts_mini_expresso specification empowers developers to easily integrate high-quality speech synthesis into their applications. By leveraging this action, you can enhance user engagement through natural voice interactions. Consider exploring various use cases, such as creating audio content, voice assistants, or enhancing accessibility features in your applications. Happy coding!