Transform Your Text into Speech with ttsds/parlertts_tiny_1_0 Cognitive Actions

In today's digital landscape, integrating voice capabilities into applications can significantly enhance user experience. The ttsds/parlertts_tiny_1_0 Cognitive Actions offer a powerful solution for developers looking to convert text into spoken word using advanced text-to-speech technology. These pre-built actions streamline the implementation process, allowing you to focus on creating engaging applications without delving deep into the complexities of speech synthesis.
Prerequisites
Before you dive into the integration of Cognitive Actions, ensure you have the following:
- An API key for the Cognitive Actions platform, which you'll need for authentication.
- Basic familiarity with making HTTP requests and handling JSON data.
To authenticate, you will typically include the API key in the request headers.
Cognitive Actions Overview
Convert Text to Speech
The Convert Text to Speech action is designed to transform a given text input into spoken word. This action allows for optional context and speaker reference input, making it versatile for various applications.
Input
The input for this action requires a JSON object with the following fields:
- text (required): The main body of text to process. This must be a non-empty string.
- prompt (optional): Additional context or instructions for processing the text. Defaults to an empty string if not provided.
- textReference (optional): Supplementary text that may provide additional context.
- speakerReference (optional): A URI pointing to an audio example representing the speaker's reference.
Here’s an example of a valid input JSON payload:
{
"text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
"prompt": "",
"textReference": "and keeping eternity before the eyes, though much.",
"speakerReference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}
Output
Upon successful execution, the action returns a URL linking to the generated audio file of the spoken text. For example:
https://assets.cognitiveactions.com/invocations/d25509fc-e87e-4827-95ac-202c6c90b6df/5e41c33f-d222-47ca-9399-493bda8cfebc.wav
This output can then be used to play the audio directly in your application or to provide a download link.
Conceptual Usage Example (Python)
Here’s a conceptual example of how you might call the Convert Text to Speech action using Python:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "e5daa6f8-386b-4072-958a-f33ce16c4f7e" # Action ID for Convert Text to Speech
# Construct the input payload based on the action's requirements
payload = {
"text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
"prompt": "",
"textReference": "and keeping eternity before the eyes, though much.",
"speakerReference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this example, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The action ID for the Convert Text to Speech action is included, and the input payload is structured according to the requirements outlined above. The endpoint URL is illustrative, so adjust it as necessary based on your actual implementation.
Conclusion
Integrating the Convert Text to Speech action from the ttsds/parlertts_tiny_1_0 spec allows developers to effortlessly add voice capabilities to their applications. By utilizing the provided input schema and example, you can create engaging, spoken content that enhances user interaction. Whether you're building a chat application, an educational tool, or any platform needing voice integration, these Cognitive Actions pave the way for innovative solutions. Explore the possibilities and consider how you can leverage this powerful tool in your next project!