Transform Text into Speech with the Fish Speech Cognitive Actions

In today's digital landscape, the ability to convert text into speech has become increasingly valuable. The Fish Speech V1.1 model, part of the ttsds/fishspeech_1_1 API, offers developers a powerful tool to enhance their applications with high-quality auditory outputs. This Cognitive Action allows you to convert textual content into speech, providing a seamless auditory experience for users. Let’s dive into how you can integrate this capability into your applications.
Prerequisites
Before you begin, ensure you have the following:
- An API key for the Cognitive Actions platform.
- Basic understanding of JSON and API requests.
- Familiarity with Python for making API calls.
Authentication typically involves passing your API key in the request headers. This ensures that your application has the necessary permissions to access the Cognitive Actions services.
Cognitive Actions Overview
Convert Text to Fish Speech
The Convert Text to Fish Speech action is designed to transform text input into speech, leveraging the Fish Speech model for enhanced auditory quality and accuracy. It requires a primary text input, an ancillary text for context, and a URI for the speaker reference.
Input
The input for this action must follow the CompositeRequest schema:
- text (required): The main text content to be converted. It should be a coherent string.
- textReference (required): An ancillary text that provides context or additional reference for the main text.
- speakerReference (required): A URI pointing to an audio file that serves as the speaker reference.
Example Input:
{
"text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
"textReference": "and keeping eternity before the eyes, though much",
"speakerReference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}
Output
Upon successful execution, this action will return a URI to the generated audio file of the speech output.
Example Output:
https://assets.cognitiveactions.com/invocations/58842dd4-afec-4bac-9db4-3519142c84cc/6fd31450-8244-4fec-8076-afb233e078e6.wav
Conceptual Usage Example (Python)
Here’s how you might implement the Convert Text to Fish Speech action using Python:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "b5c6d5e3-8b9d-4608-a234-ecc3a6cb7f99" # Action ID for Convert Text to Fish Speech
# Construct the input payload based on the action's requirements
payload = {
"text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
"textReference": "and keeping eternity before the eyes, though much",
"speakerReference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this example, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The action_id corresponds to the Convert Text to Fish Speech action, and the payload includes the necessary input fields as per the schema.
Conclusion
The Convert Text to Fish Speech action provides a robust way to enhance your applications with high-quality speech synthesis capabilities. By integrating this action, you can offer users an engaging auditory experience, whether in educational apps, content creation tools, or accessibility features. Explore this action further to see how it can fit into your development projects!