Transform Text into Engaging Dialogue with e1100x/chattts Cognitive Actions

23 Apr 2025
Transform Text into Engaging Dialogue with e1100x/chattts Cognitive Actions

In today's fast-paced digital landscape, integrating voice capabilities into applications can significantly enhance user interaction and engagement. The e1100x/chattts Cognitive Actions offer a powerful solution for developers looking to convert text into high-quality speech tailored for dialogue scenarios. By leveraging the ChatTTS model, these pre-built actions allow for seamless integration of text-to-speech functionalities, enhancing the speed, quality, and accuracy of spoken dialogue in applications.

Prerequisites

Before diving into the implementation of Cognitive Actions, ensure you have the following:

  • An API key for the Cognitive Actions platform, which will be used for authentication.
  • Familiarity with JSON formatting for constructing input payloads.

Authentication is typically handled by including your API key in the headers of your requests. This allows you to securely access the Cognitive Actions functionality.

Cognitive Actions Overview

Synthesize Dialogue Text to Speech

The Synthesize Dialogue Text to Speech action allows developers to convert text input into speech specifically designed for dialogue applications, such as interactions with large language model (LLM) assistants.

  • Category: text-to-speech
  • Purpose: This action utilizes the ChatTTS model to generate audio outputs from text, making it particularly useful for creating more natural and engaging dialogue.

Input

The input for this action is a structured JSON object that includes various configurable parameters:

  • topK (integer): Specifies the number of top candidates to consider during sampling (0 to 20).
  • topP (number): Cumulative probability threshold for top-p sampling (0 to 1).
  • speed (integer): Controls the speech speed (0 to 9).
  • prompt (string): Instructions for text refinement.
  • stream (boolean): Enables streaming mode for data processing.
  • speaker (string): Identifier for the speaker (optional).
  • addBreak (boolean): Inserts breaks into the text automatically.
  • language (string): Language code for the text.
  • manualSeed (integer): Sets a manual random seed for reproducibility.
  • useDecoder (boolean): Determines whether to use a decoder during processing.
  • temperature (number): Adjusts the randomness of sampling (0 to 1).
  • textContent (string): The text to be synthesized, with multiple texts separated by |.
  • refineTextOnly (boolean): Refines text input without generating audio.
  • skipRefineText (boolean): Bypasses text refining in processing.
  • doTextNormalization (boolean): Normalizes text format and structure.
  • doHomophoneReplacement (boolean): Replaces homophones in the text for clarity.

Example Input:

{
  "topK": 20,
  "topP": 0.7,
  "prompt": "",
  "stream": false,
  "useDecoder": true,
  "temperature": 0.3,
  "textContent": "四川美食确实以辣闻名,但也有不辣的选择。比如甜水面、赖汤圆、蛋烘糕、叶儿粑等,这些小吃口味温和,甜而不腻,也很受欢迎。",
  "skipRefineText": false,
  "doTextNormalization": true,
  "doHomophoneReplacement": false
}

Output

Upon successful execution of the action, the output will typically be a URL pointing to the generated audio file.

Example Output:

[
  "https://assets.cognitiveactions.com/invocations/3b86c823-ef51-4ac0-a58a-57ec02ec7e71/74f9c774-37b8-4fd2-be31-dc192af876c0.mp3"
]

Conceptual Usage Example (Python)

Here’s a Python snippet demonstrating how you might call the Cognitive Actions endpoint to synthesize dialogue text to speech:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "77e05651-94f7-443e-8b86-dd6f0e3c90f0" # Action ID for Synthesize Dialogue Text to Speech

# Construct the input payload based on the action's requirements
payload = {
    "topK": 20,
    "topP": 0.7,
    "prompt": "",
    "stream": False,
    "useDecoder": True,
    "temperature": 0.3,
    "textContent": "四川美食确实以辣闻名,但也有不辣的选择。比如甜水面、赖汤圆、蛋烘糕、叶儿粑等,这些小吃口味温和,甜而不腻,也很受欢迎。",
    "skipRefineText": False,
    "doTextNormalization": True,
    "doHomophoneReplacement": False
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this snippet:

  • Replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key.
  • The action ID and input payload are structured according to the requirements of the Synthesize Dialogue Text to Speech action.
  • The code handles errors gracefully, providing feedback if the action fails.

Conclusion

The e1100x/chattts Cognitive Actions provide developers with a powerful toolset for adding text-to-speech capabilities to their applications, enhancing user experience through engaging and interactive dialogue. With the ability to customize various parameters, you can refine how text is transformed into speech, making it suitable for a wide range of applications.

Consider exploring additional use cases, such as integrating this functionality into chatbots, virtual assistants, or accessibility tools, to create more inclusive and dynamic user experiences.