Transform Your Text into Speech with ParlerTTS Mini Cognitive Actions

24 Apr 2025
Transform Your Text into Speech with ParlerTTS Mini Cognitive Actions

In the world of app development, enhancing user experience through audio interaction has become increasingly important. The ParlerTTS Mini 1.0 Cognitive Actions offer a powerful tool for developers looking to integrate text-to-speech capabilities into their applications. With the ability to convert text into natural-sounding speech, these actions allow for more engaging and accessible content delivery. This article will guide you through how to utilize the Process Text-to-Speech Using ParlerTTS Mini action effectively.

Prerequisites

Before you can start using the ParlerTTS Mini Cognitive Actions, make sure you have the following:

  • API Key: An API key for the Cognitive Actions platform is essential for authentication. You will typically pass this key in the headers of your API requests.
  • Endpoint Access: Ensure you have access to the Cognitive Actions endpoint where the actions are hosted.

Cognitive Actions Overview

Process Text-to-Speech Using ParlerTTS Mini

The Process Text-to-Speech Using ParlerTTS Mini action is designed to convert input text into speech using the ParlerTTS Mini model. This action not only processes the main text but also allows for additional guidance through prompts, reference texts, or by specifying a speaker's voice via URI.

Input

The input for this action consists of the following fields:

  • text (required): The main text body to be processed. This field must be a string.
  • prompt (optional): An optional text prompt to guide the processing of the main text. Defaults to an empty string if not provided.
  • textReference (optional): A string that serves as a point of reference within the text, providing context or highlighting specific sections.
  • speakerReference (optional): A URI linking to an audio file that serves as a reference for a speaker's voice.

Example Input:

{
  "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
  "prompt": "",
  "textReference": "and keeping eternity before the eyes, though much.",
  "speakerReference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}

Output

When you invoke this action, it will return a URL linking to the generated speech audio file. The output typically looks like this:

Example Output:

https://assets.cognitiveactions.com/invocations/14b0b1c3-6378-4759-bc8c-2ae5773faaae/b442a562-9ef4-4523-96e0-2f258abfa51f.wav

Conceptual Usage Example (Python)

Here’s a conceptual example of how you might call the Process Text-to-Speech Using ParlerTTS Mini action in Python. The following snippet demonstrates structuring the input JSON payload correctly for the Cognitive Actions API.

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "843b410b-384e-4d82-81e5-f5192445007f"  # Action ID for Process Text-to-Speech Using ParlerTTS Mini

# Construct the input payload based on the action's requirements
payload = {
    "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
    "prompt": "",
    "textReference": "and keeping eternity before the eyes, though much.",
    "speakerReference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code:

  • The action_id is set to the ID of the Process Text-to-Speech Using ParlerTTS Mini action.
  • The payload is constructed based on the input schema outlined above.
  • The request is sent to the hypothetical endpoint, and the response is handled accordingly.

Conclusion

Integrating the Process Text-to-Speech Using ParlerTTS Mini Cognitive Action into your applications can significantly enhance user engagement through audio content. By leveraging this powerful action, you can provide a more immersive experience for your users, making your applications more interactive and accessible.

Consider exploring other use cases where text-to-speech can improve functionality, such as enabling voice commands or providing audio summaries of written content. Happy coding!