Transforming Text to Speech with Fish Speech Cognitive Actions

In the realm of audio synthesis, the Fish Speech V1.2 SFT model offers developers a powerful toolset for converting text into high-quality speech. This set of Cognitive Actions enables seamless integration of text-to-speech capabilities into applications, allowing for enhanced user experiences through improved audio content generation. By leveraging these pre-built actions, developers can save time and resources while delivering sophisticated audio features.
Prerequisites
Before diving into the integration of Cognitive Actions, ensure you have the following in place:
- An API key for the Cognitive Actions platform.
- Basic familiarity with JSON and HTTP requests.
Authentication typically involves passing your API key in the request headers, which ensures that your application can securely interact with the Cognitive Actions service.
Cognitive Actions Overview
Generate Speech Using Fish Speech V1.2 SFT
The Generate Speech Using Fish Speech V1.2 SFT action synthesizes audio content from text input using the advanced capabilities of the Fish Speech V1.2 SFT model. This action is designed specifically for text-to-speech operations, offering improved performance and audio quality.
Input
The action requires the following fields:
- text: The main body of text to be processed. This is a mandatory field.
- Example:
"With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good."
- Example:
- textReference: A supplementary text reference providing additional context for the main text. This is also mandatory.
- Example:
"and keeping eternity before the eyes, though much"
- Example:
- speakerReference: A URI pointing to a media resource representing the speaker. This field must contain a valid URL and is required.
- Example:
"https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
- Example:
Here is the complete input schema in JSON format:
{
"text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
"textReference": "and keeping eternity before the eyes, though much",
"speakerReference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}
Output
Upon successful execution, the action returns a URL where the synthesized audio can be accessed. The output typically looks like this:
- Example Output:
"https://assets.cognitiveactions.com/invocations/4703e3fd-3a77-45ec-a855-e9950714e9e5/fede87d6-6404-4b03-8491-8d9c270730f6.wav"
Conceptual Usage Example (Python)
Below is a conceptual Python code snippet demonstrating how to call the Generate Speech Using Fish Speech V1.2 SFT action. Replace the placeholders with your actual API key and endpoint.
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "adcedbcd-c35e-4a05-b281-4ccf729f12ba" # Action ID for Generate Speech Using Fish Speech V1.2 SFT
# Construct the input payload based on the action's requirements
payload = {
"text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
"textReference": "and keeping eternity before the eyes, though much",
"speakerReference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet, the action_id corresponds to the Generate Speech Using Fish Speech V1.2 SFT action. The payload contains the input JSON structured according to the action's requirements. The endpoint URL and request structure are illustrative, so ensure to adjust them as per your actual API specifications.
Conclusion
The Fish Speech V1.2 SFT Cognitive Action presents developers with a robust solution for integrating text-to-speech functionalities into their applications. With its improved performance and audio quality, this action can enhance user interactions through dynamic audio content generation. As you explore the capabilities of this action, consider the potential applications in areas such as accessibility, education, and entertainment, paving the way for innovative audio experiences.