Generate Speech in Multiple Languages with Indic Parler-TTS Cognitive Actions

In today's world, the demand for diverse language support in applications is growing rapidly. The sruthiselvaraj/indicparlertts API provides a robust solution through its Cognitive Actions, specifically designed for converting text into speech across 21 languages, including 20 Indic languages and English. By leveraging the power of Indic Parler-TTS, developers can enhance their applications with high-quality text-to-speech capabilities that cater to regional languages and dialects.
Prerequisites
Before diving into the integration of these Cognitive Actions, ensure you have the following:
- An API key for accessing the Indic Parler-TTS platform.
- A basic understanding of JSON and how to make API calls.
- Familiarity with Python for implementing the conceptual code examples.
Authentication typically involves passing your API key in the request headers, allowing you to securely access the Cognitive Actions.
Cognitive Actions Overview
Generate Indic Speech Using Parler-TTS
This action allows you to convert text input into speech using a pretrained model tailored for Indic languages. It provides flexibility in voice attributes, enabling a personalized audio output that can cater to various user preferences.
- Category: Text-to-Speech
Input
The input for this action requires the following fields:
- prompt (required): The text you want to be converted into speech. For optimal results, make sure the text is clear and concise.
- description (required): Detailed attributes of the desired voice, such as gender, pitch, accent, speaking pace, environment, audio clarity, and tone.
- outputFile (optional): The filename for the output audio file, defaulting to
output.wav. Ensure the filename ends with the correct audio file extension.
Example Input:
{
"prompt": "This is the best time of my life, Bartley,' she said happily",
"outputFile": "output.wav",
"description": "A male speaker with a low-pitched voice speaks with a British accent at a fast pace in a small, confined space with very clear audio and an animated tone."
}
Output
The action typically returns a URL pointing to the generated audio file. The example output would look something like this:
Example Output:
https://assets.cognitiveactions.com/invocations/95681a8a-493d-4b9e-babf-a9cd28dcb989/45a48184-0532-4924-be1c-53344544329f.wav
Conceptual Usage Example (Python)
Here’s how you might call this action using Python:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
# Action ID for Generate Indic Speech Using Parler-TTS
action_id = "35a59fa8-9651-499b-a28b-34ad5dc32a7e"
# Construct the input payload based on the action's requirements
payload = {
"prompt": "This is the best time of my life, Bartley,' she said happily",
"outputFile": "output.wav",
"description": "A male speaker with a low-pitched voice speaks with a British accent at a fast pace in a small, confined space with very clear audio and an animated tone."
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet, you can see how to structure your input payload and make a request to the Cognitive Actions endpoint. Replace the placeholder values with your actual API key and adjust the endpoint URL as needed.
Conclusion
By integrating the Indic Parler-TTS Cognitive Action into your applications, you can easily provide multilingual text-to-speech capabilities that enhance user experiences, particularly for speakers of Indic languages. Whether you're building a personal assistant, an educational app, or an accessibility tool, this action can significantly improve the functionality of your application. Start exploring and experimenting with these powerful features today!