Transforming Text to Speech with Indic Parler-TTS Cognitive Actions

24 Apr 2025
Transforming Text to Speech with Indic Parler-TTS Cognitive Actions

In the evolving landscape of application development, integrating advanced text-to-speech capabilities can significantly enhance user experience. The Indic Parler-TTS Cognitive Actions offer developers an easy way to convert text into high-quality speech in 21 languages, including 20 Indic languages and English. By leveraging a pretrained model, these actions ensure accurate and nuanced speech synthesis, making it an excellent choice for multilingual applications.

Prerequisites

Before getting started, ensure you have the following:

  • An API key for the Indic Parler-TTS Cognitive Actions platform.
  • Basic knowledge of JSON structure and Python for testing the integration.
  • Familiarity with making API calls in Python.

Authentication typically involves passing your API key in the headers of your requests, allowing you to securely access the Cognitive Actions.

Cognitive Actions Overview

Convert Text to Indic Speech

The Convert Text to Indic Speech action is designed to transform text content into speech using a pretrained model. This action is particularly useful for applications needing to read content aloud in various languages, aiding accessibility and enhancing user engagement.

Input

The input for this action requires two main properties:

  • textPrompt: A string representing the text content that will be converted to speech.
  • voiceDescription: A string that provides a detailed description of the desired voice characteristics such as accent, pitch, tone, and pace.

Example Input:

{
  "textPrompt": "This is the best time of my life, Bartley,' she said happily",
  "voiceDescription": "A male speaker with a low-pitched voice speaks with a British accent at a fast pace in a small, confined space with very clear audio and an animated tone."
}

Output

Upon successful execution, the action returns a URL pointing to the generated audio file, which can be used to play or download the speech output.

Example Output:

https://assets.cognitiveactions.com/invocations/fb97b8a4-6a2e-45b0-9263-94246516cb72/f5ad1408-2722-4eb3-8049-32ea28917673.wav

Conceptual Usage Example (Python)

Here’s a conceptual example of how you might use the Convert Text to Indic Speech action in a Python application:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "84b39603-2c59-42dd-97f4-7d43e86001d5"  # Action ID for Convert Text to Indic Speech

# Construct the input payload based on the action's requirements
payload = {
    "textPrompt": "This is the best time of my life, Bartley,' she said happily",
    "voiceDescription": "A male speaker with a low-pitched voice speaks with a British accent at a fast pace in a small, confined space with very clear audio and an animated tone."
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this example, the developer sets up the API call by defining the action ID and constructing the input payload according to the action's specifications. The response is then handled to either output the resulting audio URL or manage any potential errors.

Conclusion

Integrating the Indic Parler-TTS Cognitive Actions into your applications allows you to effortlessly add text-to-speech functionality, enhancing accessibility and user engagement. With support for multiple languages and customizable voice characteristics, these actions provide a powerful tool for developers.

Consider exploring additional use cases, such as creating educational applications, enhancing voice assistants, or developing multimedia content that requires audio narration. By leveraging these capabilities, you can create richer, more interactive experiences for your users.