Create Engaging Speech with jichengdu/fish-speech Cognitive Actions

In today's digital landscape, text-to-speech technology has transformed how we interact with applications. The jichengdu/fish-speech API provides powerful Cognitive Actions to generate high-quality, personalized speech conversions in both Chinese and English. These pre-built actions simplify the integration process, allowing developers to enhance their applications with voice capabilities effortlessly.
Prerequisites
Before diving into the Cognitive Actions, ensure you have the following:
- An API key for the jichengdu/fish-speech platform.
- Basic knowledge of JSON and API calls.
- Familiarity with Python for running example code.
Authentication typically involves including your API key in the request headers, allowing you to securely access the service.
Cognitive Actions Overview
Generate Personalized Speech
The Generate Personalized Speech action utilizes the Fish Speech V1.5 technology to convert text into high-quality speech. This action supports both zero-shot and few-shot voice cloning, enabling personalized audio outputs with minimal audio samples.
- Category: Text-to-Speech
Input
This action requires the following fields in the input schema:
- text (required): The text content to be converted into speech.
- useCompile (optional): A boolean specifying whether to use compilation optimization. Defaults to
true. - referenceText (optional): Text content corresponding to reference audio, if any.
- referenceAudio (optional): A URI to the reference audio file.
Example Input:
{
"text": "我的猫猫就是全世界最好的猫",
"useCompile": true
}
Output
Upon successful execution, the action returns a URI linking to the generated audio file. For example:
https://assets.cognitiveactions.com/invocations/d942e78e-0680-4061-89ac-f72e44daf7f0/92701fef-951c-4228-9223-b8e9587d155a.wav
This URI can be used to play or download the resulting speech audio.
Conceptual Usage Example (Python)
Here's a conceptual Python snippet demonstrating how to call the Generate Personalized Speech action:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "575fb034-bc6a-4e47-81d0-18421a1fd78c" # Action ID for Generate Personalized Speech
# Construct the input payload based on the action's requirements
payload = {
"text": "我的猫猫就是全世界最好的猫",
"useCompile": True
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet, replace "YOUR_COGNITIVE_ACTIONS_API_KEY" with your actual API key. The payload is structured to match the action's input schema. The endpoint URL and request structure are hypothetical and should be adjusted based on your actual integration.
Conclusion
The jichengdu/fish-speech Cognitive Actions provide a robust solution for integrating high-quality text-to-speech capabilities into your applications. By leveraging the Generate Personalized Speech action, developers can create engaging user experiences with minimal effort. Explore additional use cases or combine this action with other functionalities to enhance your projects even further!