Transform Text into Natural Speech with Voicecraft

Voicecraft is a powerful tool designed to convert text into high-quality speech using advanced speech synthesis models. This innovative service offers developers the ability to create realistic voiceovers, enhance accessibility features, and automate content delivery with just a few API calls. By harnessing Voicecraft's capabilities, you can improve user engagement and streamline communication in applications ranging from educational tools to customer service interfaces.
Prerequisites
To get started with Voicecraft, you'll need a Cognitive Actions API key and a basic understanding of how to make API calls.
Generate Voice from Text
The "Generate Voice from Text" action allows you to convert written content into spoken language. This capability is essential for applications that require dynamic voice output, such as virtual assistants, audiobooks, or language learning tools. By utilizing advanced speech synthesis models, this action ensures high speed, quality, and accuracy of the generated speech.
Input Requirements
To effectively use this action, you need to provide the following inputs:
- text: The string that needs to be synthesized into speech. For example, "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good."
- version: Specifies which version of the speech synthesis model to use. Options include "giga330m", "giga830m", "giga_libritts_330m", "giga330m_tts_enhanced", and "giga830m_tts_enhanced", with the default being "giga330m".
- textReference: A reference transcript that guides the speech synthesis, such as "and keeping eternity before the eyes, though much."
- speakerReference: A URI pointing to an audio file that captures the speaker's voice characteristics, providing an essential reference for the synthesis process.
Expected Output
The output will be a URL linking to the generated audio file of the synthesized speech. For instance, it could return a link like "https://assets.cognitiveactions.com/invocations/8a0ffc20-77c4-4968-b593-0207ec28dcc2/812e7b03-de71-4945-981c-a677a5857623.wav".
Use Cases for this Specific Action
- Content Creation: Automate the generation of voiceovers for videos, podcasts, or presentations, saving time and resources.
- Accessibility Enhancements: Improve your application's accessibility by providing audio versions of text content for users with visual impairments.
- Interactive Learning: Develop engaging language learning apps that convert text into speech, aiding pronunciation and comprehension.
- Customer Support: Create automated responses in customer service applications that sound natural and personable, enhancing user experience.
import requests
import json
# Replace with your actual Cognitive Actions API key and endpoint
# Ensure your environment securely handles the API key
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
# This endpoint URL is hypothetical and should be documented for users
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"
action_id = "f02c6d73-514e-4a0b-9a13-329ba8594b58" # Action ID for: Generate Voice from Text
# Construct the exact input payload based on the action's requirements
# This example uses the predefined example_input for this action:
payload = {
"text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
"version": "giga330m",
"textReference": "and keeping eternity before the eyes, though much.",
"speakerReference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json",
# Add any other required headers for the Cognitive Actions API
}
# Prepare the request body for the hypothetical execution endpoint
request_body = {
"action_id": action_id,
"inputs": payload
}
print(f"--- Calling Cognitive Action: {action.name or action_id} ---")
print(f"Endpoint: {COGNITIVE_ACTIONS_EXECUTE_URL}")
print(f"Action ID: {action_id}")
print("Payload being sent:")
print(json.dumps(request_body, indent=2))
print("------------------------------------------------")
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json=request_body
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully. Result:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body (non-JSON): {e.response.text}")
print("------------------------------------------------")
Conclusion
Voicecraft offers a seamless way to transform text into natural-sounding speech, making it an invaluable tool for developers looking to enhance user interaction in their applications. Whether you're building educational tools, improving accessibility, or automating content delivery, the capabilities provided by Voicecraft can significantly streamline your development process. Explore the possibilities of Voicecraft today and elevate your applications with sophisticated speech synthesis.