Transform Your Applications with Indic Text-to-Speech Using Cognitive Actions

Integrating Text-to-Speech (TTS) capabilities into your applications can significantly enhance user engagement and accessibility. The ttsds/parlertts_indic API provides developers with powerful Cognitive Actions specifically designed to generate high-quality speech synthesis for Indic languages. By leveraging these pre-built actions, you can seamlessly convert text into speech, customize voice outputs, and provide a more interactive user experience.
Prerequisites
To get started with the Cognitive Actions, you will need:
- An API key for the Cognitive Actions platform to authenticate your requests.
- Basic knowledge of JSON and API calls.
Authentication is typically done by passing your API key in the request headers. This allows you to securely access the Cognitive Actions functionalities.
Cognitive Actions Overview
Generate Indic TTS
Purpose:
The "Generate Indic TTS" action allows developers to convert text into spoken audio for Indic languages. You can customize the speech output using optional prompts and voice references, ensuring a natural and contextually relevant audio experience.
Category: Text-to-Speech
Input:
The input for this action is structured as a JSON object with the following fields:
- text (required): The main content to be converted into speech.
Example:"With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good." - prompt (optional): A string that can be used to guide or customize the text processing.
Example:""(empty string). - textReference (optional): A snippet of text that provides context for the main text.
Example:"and keeping eternity before the eyes, though much." - speakerReference (optional): A URL linking to an audio file that serves as a reference for the speaker's voice.
Example:"https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
Here’s how the complete input JSON might look:
{
"text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
"prompt": "",
"textReference": "and keeping eternity before the eyes, though much.",
"speakerReference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}
Output:
Upon successful execution, the action returns a URL pointing to the generated audio file.
Example Output: "https://assets.cognitiveactions.com/invocations/ac2f5a8d-3ee5-4f47-871c-149781cfae22/c195f58e-f31c-477d-8a2b-f6f0cddac070.wav"
Conceptual Usage Example (Python):
Here’s a conceptual example of how you might invoke the "Generate Indic TTS" action using Python:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "d3c9ebad-9005-4f74-a873-ada896efc4b3" # Action ID for Generate Indic TTS
# Construct the input payload based on the action's requirements
payload = {
"text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
"prompt": "",
"textReference": "and keeping eternity before the eyes, though much.",
"speakerReference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The action_id is specified for the "Generate Indic TTS" action, and the input payload is structured according to the requirements. The endpoint URL and JSON structure are illustrative.
Conclusion
The ttsds/parlertts_indic Cognitive Actions offer a robust solution for integrating text-to-speech capabilities in Indic languages into your applications. With the ability to customize text prompts and speaker references, you can create a more engaging user experience. Start leveraging these actions today to enhance your applications with dynamic audio content!