Effortless Text-to-Speech Conversion with Cosyvoice

Cosyvoice is an innovative service designed to transform text into high-quality speech, allowing developers to integrate advanced speech synthesis capabilities into their applications seamlessly. By leveraging large language models, Cosyvoice enables scalable streaming speech synthesis, providing a powerful tool for enhancing user interactions and accessibility. With its ability to convert written content into natural-sounding audio, Cosyvoice opens up a multitude of use cases, such as creating voiceovers for educational content, powering virtual assistants, or enabling accessibility features for visually impaired users.
Prerequisites
To get started with Cosyvoice, you will need a valid Cognitive Actions API key and a basic understanding of making API calls.
Perform Scalable Streaming Speech Synthesis
The "Perform Scalable Streaming Speech Synthesis" action allows you to convert text into high-quality speech efficiently. This action addresses the challenge of generating natural-sounding audio from written content, making it a vital component for applications that require voice interaction.
Input Requirements: The input for this action is structured as a JSON object with the following properties:
- text: The main text content to be synthesized into speech (string).
- prompt: An accompanying text that serves as a contextual reference for the synthesis (string).
- promptAudio: An optional reference audio file in WAV format, which can enhance the synthesis by providing a sample audio input.
Example Input:
{
"text": "你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?",
"prompt": "希望你以后能够做的比我还好呦。"
}
Expected Output: The output will be a URL pointing to the generated audio file, which can be played or downloaded. Example Output:
https://assets.cognitiveactions.com/invocations/0f485763-544f-4519-959f-31aecb1ae596/d57dd35a-6340-474b-a201-c945b239c9f9.wav
Use Cases for this specific action:
- Educational Tools: Enhance learning experiences by converting textbooks or articles into audio format, allowing students to listen while they study.
- Virtual Assistants: Create more engaging and responsive virtual assistants that can read content aloud, making interactions feel more natural.
- Accessibility: Support users with visual impairments by providing audio versions of web content, ensuring everyone can access information easily.
- Content Creation: Generate voiceovers for videos, podcasts, or audiobooks, streamlining the content production process.
```python
import requests
import json
# Replace with your actual Cognitive Actions API key and endpoint
# Ensure your environment securely handles the API key
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
# This endpoint URL is hypothetical and should be documented for users
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"
action_id = "beef1d57-02e5-44cd-90d9-aacaeee768f9" # Action ID for: Perform Scalable Streaming Speech Synthesis
# Construct the exact input payload based on the action's requirements
# This example uses the predefined example_input for this action:
payload = {
"text": "你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?",
"prompt": "希望你以后能够做的比我还好呦。"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json",
# Add any other required headers for the Cognitive Actions API
}
# Prepare the request body for the hypothetical execution endpoint
request_body = {
"action_id": action_id,
"inputs": payload
}
print(f"--- Calling Cognitive Action: {action.name or action_id} ---")
print(f"Endpoint: {COGNITIVE_ACTIONS_EXECUTE_URL}")
print(f"Action ID: {action_id}")
print("Payload being sent:")
print(json.dumps(request_body, indent=2))
print("------------------------------------------------")
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json=request_body
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully. Result:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body (non-JSON): {e.response.text}")
print("------------------------------------------------")
### Conclusion
Cosyvoice's scalable speech synthesis capabilities empower developers to enrich their applications with natural-sounding audio, enhancing user engagement and accessibility. With its straightforward input requirements and versatile output, integrating Cosyvoice into your projects can significantly improve user experience. Start exploring how you can utilize this powerful tool to transform text into speech and elevate your applications today!