Transform Vietnamese Text to Speech with tuannha/f5-tts-vi Cognitive Actions

In the realm of voice technology, the ability to convert text into natural-sounding speech has become increasingly important, especially for enhancing accessibility and user engagement in applications. The tuannha/f5-tts-vi API offers a powerful cognitive action to transform Vietnamese text into lifelike speech using advanced voice cloning techniques. This article will guide you through the capabilities of the Generate Vietnamese Voice Output action, detailing how to implement it effectively in your applications.
Prerequisites
Before you start integrating the Cognitive Actions, ensure you have the following:
- An API key for the Cognitive Actions platform.
- Basic knowledge of JSON and API calls.
- Familiarity with Python for testing the API integration.
For authentication, you will typically pass your API key in the request headers to authorize your access to the action.
Cognitive Actions Overview
Generate Vietnamese Voice Output
Description: This action transforms Vietnamese text into natural-sounding speech using the EraX-Smile-Female-F5-V1.0 model. It employs online zero-shot voice cloning, enhanced by fine-tuning on a large dataset for improved accuracy and nuance.
Category: Text-to-Speech
Input
The input for this action is structured as follows:
- referenceAudio (string, required): A URI pointing to a WAV format audio file that serves as a benchmark for the task.
- referenceText (string, required): A reference text that must correspond to the content of the reference audio.
- inputText (string, optional): The text that you want to convert to speech. Default is "Đây là đài tiếng nói Việt Nam."
- speed (number, optional): Defines the speed of the speech output, with acceptable values ranging from 0.5 to 1.5. The default speed is 1.
Example Input:
{
"speed": 1,
"inputText": "Đây là đài tiếng nói Việt Nam. Phát thanh từ kênh trung ương Hà Nội.",
"referenceText": "Cuộc sống giống như một trang sách. Người lười biếng sẽ dở qua chúng thật nhanh, người khôn ngoan sẽ vừa đọc vừa suy ngẫm.",
"referenceAudio": "https://replicate.delivery/pbxt/MkYYlI5IlXvFWGVJG2ijHKSB8ZPpLGuSvqFEQtvbgQ1r7gon/coral_vn.wav"
}
Output
The output from this action is a URI pointing to the generated audio file in WAV format, which contains the speech produced from the input text.
Example Output:
https://assets.cognitiveactions.com/invocations/be4c95ec-1463-48b4-8745-cb570ca9416a/932014f6-b7f2-4350-895a-2c911b0c7799.wav
Conceptual Usage Example (Python)
Here’s a conceptual Python code snippet demonstrating how to call the Generate Vietnamese Voice Output action:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "65d17c9b-6518-47f0-8fc4-93100a91120e" # Action ID for Generate Vietnamese Voice Output
# Construct the input payload based on the action's requirements
payload = {
"speed": 1,
"inputText": "Đây là đài tiếng nói Việt Nam. Phát thanh từ kênh trung ương Hà Nội.",
"referenceText": "Cuộc sống giống như một trang sách. Người lười biếng sẽ dở qua chúng thật nhanh, người khôn ngoan sẽ vừa đọc vừa suy ngẫm.",
"referenceAudio": "https://replicate.delivery/pbxt/MkYYlI5IlXvFWGVJG2ijHKSB8ZPpLGuSvqFEQtvbgQ1r7gon/coral_vn.wav"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this example, you need to replace the placeholder for COGNITIVE_ACTIONS_API_KEY with your actual API key. The action_id specifies which cognitive action to execute, and the input payload is formatted according to the expected schema. The endpoint URL and request structure are illustrative and should be adapted to your specific integration.
Conclusion
Integrating the Generate Vietnamese Voice Output action from the tuannha/f5-tts-vi API allows developers to seamlessly convert Vietnamese text into natural-sounding speech, enhancing user experience across various applications. By utilizing this cognitive action, you can create more accessible and engaging content for Vietnamese-speaking audiences. Consider exploring additional use cases such as voiceovers for videos, accessibility features, or interactive applications.