Create Natural-Sounding Speech from Text with Whisperspeech

Whisperspeech is a powerful service designed to transform text into lifelike speech, offering developers an easy way to integrate speech synthesis capabilities into their applications. With the ability to select from various languages and speaker voices, Whisperspeech simplifies the process of generating audio content, making it ideal for a wide range of use cases. Whether you're building an interactive voice assistant, enhancing accessibility features, or creating engaging content for e-learning platforms, Whisperspeech can significantly enhance user experience by providing natural-sounding speech.
Prerequisites
To get started with Whisperspeech, you will need an API key for Cognitive Actions and a basic understanding of API calls to effectively integrate the speech synthesis functionality into your project.
Perform Speech Synthesis
The "Perform Speech Synthesis" action allows you to generate synthesized speech from text, utilizing a specified language and speaker reference. This action is categorized under text-to-speech and aims to resolve the challenge of producing realistic speech from written content.
Input Requirements
To use this action, you need to provide the following input parameters:
- text: The text you want to convert into speech. It should be clear and concise for optimal results. For example, "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good."
- speakerReference: A URI reference to an audio file that serves as the voice model for the synthesis. This should follow the URI format, such as
https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav. - language: The language code corresponding to the text being synthesized. Supported codes include 'en', 'pl', 'de', 'fr', 'it', 'nl', 'es', and 'pt', with 'en' as the default.
- version: Specifies the version of the model to use for synthesis, with options including 'tiny', 'base', 'small', and 'medium'. The default version is 'small'.
Expected Output
The output of this action will be a URI link to an audio file containing the synthesized speech. For instance, the output might look like this: https://assets.cognitiveactions.com/invocations/886a7712-04ab-49cd-8606-5f6c098cff47/bd3e5c65-a40a-4803-b2d8-f2223c768c4e.wav.
Use Cases for this Specific Action
- Voice Assistants: Integrate speech synthesis into virtual assistants to provide users with spoken responses, enhancing interactivity.
- Accessibility Tools: Create applications that convert written content into speech, improving accessibility for visually impaired users.
- E-Learning Platforms: Enhance educational content by providing audio narration for text-based materials, making learning more engaging.
- Audiobook Production: Generate audio versions of written works, allowing for a seamless transition from text to spoken word.
import requests
import json
# Replace with your actual Cognitive Actions API key and endpoint
# Ensure your environment securely handles the API key
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
# This endpoint URL is hypothetical and should be documented for users
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"
action_id = "82d35669-7cd3-4140-9aa7-09d1a1d37d3f" # Action ID for: Perform Speech Synthesis
# Construct the exact input payload based on the action's requirements
# This example uses the predefined example_input for this action:
payload = {
"text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
"version": "small",
"language": "en",
"speakerReference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json",
# Add any other required headers for the Cognitive Actions API
}
# Prepare the request body for the hypothetical execution endpoint
request_body = {
"action_id": action_id,
"inputs": payload
}
print(f"--- Calling Cognitive Action: {action.name or action_id} ---")
print(f"Endpoint: {COGNITIVE_ACTIONS_EXECUTE_URL}")
print(f"Action ID: {action_id}")
print("Payload being sent:")
print(json.dumps(request_body, indent=2))
print("------------------------------------------------")
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json=request_body
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully. Result:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body (non-JSON): {e.response.text}")
print("------------------------------------------------")
Conclusion
Whisperspeech offers an efficient solution for developers looking to incorporate speech synthesis into their applications. By enabling the conversion of text to natural-sounding speech in various languages and voice styles, it opens up numerous possibilities for enhancing user engagement and accessibility. To leverage this powerful tool, start integrating the "Perform Speech Synthesis" action into your projects today and explore the transformative impact it can have on your applications.