Transforming Text into Speech with datong-new/tts-zh Cognitive Actions

21 Apr 2025
Transforming Text into Speech with datong-new/tts-zh Cognitive Actions

Integrating text-to-speech capabilities into your application can significantly enhance user interaction and accessibility. The datong-new/tts-zh Cognitive Actions provide a powerful API for converting text into spoken word, supporting various voice types in both Chinese and English. This blog post will guide you through the Generate Voice from Text action, outlining its purpose, input and output structures, and how you can leverage it in your applications.

Prerequisites

To start using the Cognitive Actions, you'll need an API key for the Cognitive Actions platform and a basic understanding of how to make HTTP requests. Authentication typically involves passing your API key in the request headers, allowing you to securely access the actions.

Cognitive Actions Overview

Generate Voice from Text

The Generate Voice from Text action is designed to convert text into spoken audio, offering multiple voice options tailored to specific languages and genders. This flexibility allows developers to create a more personalized experience for users based on their preferences.

Input

The required fields for this action are:

  • text: This is the content you want to convert into speech. It should be a descriptive sentence in the desired language.
  • voiceType: This field specifies the type of voice for narration. You can select from the following options:
    • Chinese_man
    • Chinese_woman
    • Chinese_Cantonese_women
    • English_woman
    • English_man

Example Input:

{
  "text": "散文是一种抒发作者真情实感、写作方式灵活的记叙类文学体裁。",
  "voiceType": "Chinese_Cantonese_women"
}

Output

Upon successful execution, the action returns a URL pointing to the generated audio file. The output typically includes:

  • out: A link to the audio file containing the spoken text.

Example Output:

{
  "out": "https://assets.cognitiveactions.com/invocations/9ba34687-f090-4277-b5c7-842f63d4d973/85b60627-40b3-4edb-9ac7-316630c3cb0f.wav"
}

Conceptual Usage Example (Python)

Here's a conceptual Python code snippet demonstrating how you might call the Generate Voice from Text action using a hypothetical Cognitive Actions endpoint:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "d6a7d176-2279-4702-9be5-9606160588cb"  # Action ID for Generate Voice from Text

# Construct the input payload based on the action's requirements
payload = {
    "text": "散文是一种抒发作者真情实感、写作方式灵活的记叙类文学体裁。",
    "voiceType": "Chinese_Cantonese_women"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this snippet:

  • Replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key.
  • The payload variable holds the JSON structure required by the action.
  • The code handles the HTTP request and prints the resulting audio file URL if the action is successful.

Conclusion

The Generate Voice from Text action from the datong-new/tts-zh Cognitive Actions opens up exciting possibilities for developers looking to enhance their applications with speech capabilities. By following the guidelines outlined in this post, you can easily integrate this powerful feature into your projects. Whether it's for accessibility, engagement, or simply adding a new dimension to your application, exploring cognitive actions like this can lead to innovative user experiences. Happy coding!