Clone Voices Across Languages with the ViX TTS Cognitive Actions

22 Apr 2025
Clone Voices Across Languages with the ViX TTS Cognitive Actions

In an increasingly globalized world, the demand for multi-language support in voice applications is higher than ever. The ViX TTS service provides a powerful solution for developers looking to integrate voice cloning capabilities into their applications. With the Clone Voice Across Languages action, you can leverage advanced voice synthesis to create voice outputs in various languages, using just a short audio clip as input. This article will walk you through the capabilities of this action and how to use it effectively.

Prerequisites

Before you start integrating the ViX TTS Cognitive Actions, ensure you have the following:

  • An API key for the Cognitive Actions platform.
  • Knowledge of how to structure API requests, particularly in JSON format.
  • Basic familiarity with Python and HTTP requests to execute the Cognitive Actions.

Authentication typically involves passing your API key in the headers of your requests, allowing you to access the functionalities securely.

Cognitive Actions Overview

Clone Voice Across Languages

The Clone Voice Across Languages action enables you to clone a voice's characteristics into different languages using only a 6-second audio sample. This capability is particularly beneficial for applications requiring voice personalization across diverse linguistic settings.

Input:

The following fields are required or optional when invoking the action:

  • text (string): The text that needs to be synthesized into speech.
    • Example: "Hạnh phúc luôn là niềm khao khát lớn nhất của con người."
  • speaker (string): A URL pointing to the original speaker's audio file (formats: wav, mp3, m4a, ogg, or flv). The audio should be at least 6 seconds long.
    • Example: "https://replicate.delivery/pbxt/KibHoI1aA7kYweYgeSV2fFOY67QwEuZNe5l1tFX7Z6FkaEoi/samples_nu-luu-loat.wav"
  • outputLanguage (string, optional): The language in which the speech will be synthesized. Defaults to Vietnamese (vi). Other options include en (English), es (Spanish), fr (French), etc.
    • Example: "vi"
  • cleanupVoice (boolean, optional): Indicates whether to apply denoising to the speaker audio. Defaults to true.
    • Example: true
  • normalizeText (boolean, optional): Specifies whether the input text should be normalized before processing. Defaults to true.
    • Example: true
  • useDeepFilter (boolean, optional): Determines whether to apply deep filtering techniques to the audio. Defaults to true.
    • Example: true
  • awsAccessKeyId (string, optional): Your AWS Access Key ID for authentication.
  • awsSecretAccessKey (string, optional): Your AWS Secret Access Key used for authentication.
  • bucketName (string, optional): The name of the AWS S3 bucket where files are stored.
  • cdnDownloadUrl (string, optional): URL for downloading the synthesized speech from a CDN.

Example Input:

Here is a practical example of the JSON payload needed to invoke this action:

{
  "text": "Hạnh phúc luôn là niềm khao khát lớn nhất của con người.",
  "speaker": "https://replicate.delivery/pbxt/KibHoI1aA7kYweYgeSV2fFOY67QwEuZNe5l1tFX7Z6FkaEoi/samples_nu-luu-loat.wav",
  "cleanupVoice": true,
  "normalizeText": true,
  "useDeepFilter": true,
  "outputLanguage": "vi"
}

Output:

When the action is executed successfully, it typically returns a URL to the synthesized audio file:

{
  "path": "https://assets.cognitiveactions.com/invocations/ed9a4da6-92a8-4203-bb27-2c0307224ff8/22174b27-9bcb-44f1-8fae-e6ef01f77b77.wav"
}

Conceptual Usage Example (Python):

The following Python code snippet demonstrates how you might call the hypothetical Cognitive Actions execution endpoint:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "98aabd37-2675-47a2-a029-8cdbce5ba0d4"  # Action ID for Clone Voice Across Languages

# Construct the input payload based on the action's requirements
payload = {
    "text": "Hạnh phúc luôn là niềm khao khát lớn nhất của con người.",
    "speaker": "https://replicate.delivery/pbxt/KibHoI1aA7kYweYgeSV2fFOY67QwEuZNe5l1tFX7Z6FkaEoi/samples_nu-luu-loat.wav",
    "cleanupVoice": True,
    "normalizeText": True,
    "useDeepFilter": True,
    "outputLanguage": "vi"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet, replace the COGNITIVE_ACTIONS_API_KEY with your actual API key. The action ID for "Clone Voice Across Languages" is set, and the payload is structured as per the requirements. The endpoint URL and request structure are illustrative.

Conclusion

Integrating the Clone Voice Across Languages action from the ViX TTS Cognitive Actions can significantly enhance the multilingual capabilities of your application. By utilizing just a short audio clip, you can generate high-quality voice outputs in multiple languages, making your app more accessible and engaging for users around the globe.

Consider exploring additional use cases and experimenting with the various parameters to fully leverage the power of voice cloning. Happy coding!