Transforming Text into Speech with Parler TTS Cognitive Actions

In today's digital landscape, converting text into speech has become an essential feature for many applications, enhancing accessibility and user engagement. The Parler TTS Cognitive Actions offer developers an easy way to integrate high-quality text-to-speech capabilities into their applications. With options for model variants and audio formats, these pre-built actions save time and resources, allowing you to focus on delivering value to your users.
Prerequisites
To get started with the Parler TTS Cognitive Actions, you will need an API key for authentication. This key should be included in the request headers to access the service securely. Ensure you have your environment set up to make HTTP requests, and you’ll be ready to transform text into speech in no time.
Cognitive Actions Overview
Convert Text to Speech
The Convert Text to Speech action is designed to transform input text into audio speech, providing options to select between 'mini' and 'large' model variants based on your performance needs. You can also choose the output format as either 'wav' or 'mp3'.
- Category: text-to-speech
Input
The input for this action consists of several parameters defined in the schema:
- model (string, optional): Specifies the model variant to use, either 'mini' or 'large'. Defaults to 'mini'.
- prompt (string, required): The text that will be converted to speech. Defaults to a generic greeting.
- description (string, optional): Describes the characteristics of the speaker's voice, including expressiveness, speech pace, pitch, and recording quality.
- outputFormat (string, optional): The audio format of the output file, either 'wav' or 'mp3'. Defaults to 'wav'.
Example Input:
{
"model": "large",
"prompt": "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker's voice sounding clear and very close up.",
"description": "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker's voice sounding clear and very close up.",
"outputFormat": "wav"
}
Output
When the action is successfully executed, it returns a URL to the generated audio file.
Example Output:
https://assets.cognitiveactions.com/invocations/2d7fe78d-dc00-4ae4-9dfc-15e4ca25e942/42175b3c-3086-4efd-92fa-ec28cdf25d3f.wav
Conceptual Usage Example (Python)
Here’s how you might call the Convert Text to Speech action using a conceptual Python code snippet:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "44c2679c-fde4-4629-bef8-19c17c5342c6" # Action ID for Convert Text to Speech
# Construct the input payload based on the action's requirements
payload = {
"model": "large",
"prompt": "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker's voice sounding clear and very close up.",
"description": "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker's voice sounding clear and very close up.",
"outputFormat": "wav"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet, you replace "YOUR_COGNITIVE_ACTIONS_API_KEY" with your actual API key. The action_id corresponds to the Convert Text to Speech action, and the payload is structured according to the input schema. This illustrative example demonstrates how to make a request to the Cognitive Actions API to convert text into speech.
Conclusion
The Parler TTS Cognitive Actions provide a robust solution for integrating text-to-speech functionality into your applications. By leveraging these pre-built actions, developers can enhance user interaction, making content more accessible and engaging. Consider exploring various use cases, such as creating voiceovers for videos, enhancing accessibility for visually impaired users, or developing interactive voice response systems. Start integrating today and transform the way users interact with your application!