Transform Written Vietnamese into Speech with F5-TTS Cognitive Actions

In this blog post, we'll explore the tuannha/f5-tts-vi API, which offers an innovative way to convert written Vietnamese text into expressive speech. This API leverages the powerful EraX-Smile-Female-F5-V1.0 model, enabling developers to create applications that require realistic voice synthesis. Whether you are building accessibility tools, creative applications, or personal projects, these pre-built Cognitive Actions can enhance user experience by adding a human-like voice to your text content.
Prerequisites
Before you can start using the Cognitive Actions, make sure you have the following:
- An API key for the Cognitive Actions platform. This key will be used for authentication when making API requests.
- Basic knowledge of JSON and familiarity with making HTTP requests in your programming language of choice.
To authenticate your requests, you will typically need to pass your API key in the header of your requests.
Cognitive Actions Overview
Enhance Vietnamese Speech with EraX F5-TTS
This action allows you to transform written Vietnamese text into expressive speech. By using this action, you can take advantage of online zero-shot voice cloning, making it ideal for various applications.
Input
The input for this action requires the following fields:
- referenceAudio (string): A URI linking to the reference audio file in WAV format. This audio must match the reference text exactly for synchronization.
- referenceText (string): A reference text that must match the reference audio precisely.
- inputText (string, optional): The text you want to convert to speech. If not provided, it defaults to "Đây là đài tiếng nói Việt Nam."
- speed (number, optional): The speed of the speech playback, which can range from 0.5 to 1.5. The default speed is 1.
Here’s an example of the input JSON payload you would send:
{
"speed": 1,
"inputText": "Đây là đài tiếng nói Việt Nam. Phát thanh từ kênh trung ương Hà Nội.",
"referenceText": "Cuộc sống giống như một trang sách. Người lười biếng sẽ dở qua chúng thật nhanh, người khôn ngoan sẽ vừa đọc vừa suy ngẫm",
"referenceAudio": "https://replicate.delivery/pbxt/MkYYlI5IlXvFWGVJG2ijHKSB8ZPpLGuSvqFEQtvbgQ1r7gon/coral_vn.wav"
}
Output
Upon successful execution, this action returns a URI pointing to the generated speech audio file in WAV format. Here’s an example of what the output might look like:
https://assets.cognitiveactions.com/invocations/7d5d397f-231c-4d95-84d3-dac4c48aa89f/776a5a3c-e254-4cbd-8442-1bd26e0a5f44.wav
Conceptual Usage Example (Python)
Here’s how you might structure a call to the Cognitive Actions endpoint using Python:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "1828a277-7ff0-446a-81c3-02fdbe0ffdd1" # Action ID for Enhance Vietnamese Speech with EraX F5-TTS
# Construct the input payload based on the action's requirements
payload = {
"speed": 1,
"inputText": "Đây là đài tiếng nói Việt Nam. Phát thanh từ kênh trung ương Hà Nội.",
"referenceText": "Cuộc sống giống như một trang sách. Người lười biếng sẽ dở qua chúng thật nhanh, người khôn ngoan sẽ vừa đọc vừa suy ngẫm",
"referenceAudio": "https://replicate.delivery/pbxt/MkYYlI5IlXvFWGVJG2ijHKSB8ZPpLGuSvqFEQtvbgQ1r7gon/coral_vn.wav"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet, the action_id is set to the ID of the "Enhance Vietnamese Speech with EraX F5-TTS" action. The input payload is structured according to the action's requirements, and the API key is included in the headers for authorization.
Conclusion
The tuannha/f5-tts-vi API offers a powerful tool for developers looking to integrate expressive Vietnamese speech synthesis into their applications. By utilizing the Enhance Vietnamese Speech with EraX F5-TTS action, you can easily convert text into high-quality speech, enhancing accessibility and user engagement in your projects.
As a next step, consider experimenting with different input texts and speeds to see how they affect the speech output, or think about integrating this functionality into a larger application to provide a seamless user experience. Happy coding!