Transform Text into Engaging Audio with Fishspeech 1.4

In today's digital landscape, the ability to convert text into natural-sounding audio can significantly enhance user engagement and accessibility. Fishspeech 1.4 offers developers a robust API that leverages advanced text-to-speech technology to transform written content into lifelike audio. By using speaker-specific references, developers can ensure that the generated audio not only conveys the intended message but also embodies the unique characteristics of a specified voice. This capability provides a multitude of benefits, including improved user experiences in applications such as e-learning platforms, audiobooks, and interactive voice assistants.
Common use cases for Fishspeech 1.4 include creating audio content for educational materials, generating voiceovers for videos, and enhancing accessibility features for visually impaired users. Whether you're building a podcasting tool, an interactive story app, or a virtual assistant, Fishspeech 1.4 can help you deliver content in a more engaging and accessible format.
Generate Fish Speech
The "Generate Fish Speech" action is designed to process text into audio, utilizing speaker references to ensure compliance with model terms. This action effectively solves the challenge of creating personalized audio content that resonates with users by mimicking a specific speaker's voice.
Input Requirements
To use this action, you need to provide the following inputs:
- text: The main content that will be processed into audio. This should be a meaningful string that conveys the desired message.
- textReference: A supplementary string related to the main text, providing context or comparison.
- speakerReference: A URI pointing to an audio file that serves as a reference for the speaker's voice. This must be a valid URI format.
Example Input:
{
"text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
"textReference": "and keeping eternity before the eyes, though much",
"speakerReference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}
Expected Output
The output will be a URI that points to the generated audio file containing the spoken version of the provided text, reflecting the characteristics of the specified speaker's voice.
Example Output:
https://assets.cognitiveactions.com/invocations/0f97d668-acc8-499b-814c-6660d4aaf063/e90c3886-a42a-491b-a525-896ccef7f356.wav
Use Cases for this Specific Action
- E-Learning Platforms: Create audio versions of course materials to cater to auditory learners and enhance engagement.
- Audiobooks: Generate voiceovers that sound like specific narrators to maintain consistency and appeal to listeners.
- Interactive Applications: Use personalized voiceovers to create immersive experiences in games or interactive stories.
import requests
import json
# Replace with your actual Cognitive Actions API key and endpoint
# Ensure your environment securely handles the API key
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
# This endpoint URL is hypothetical and should be documented for users
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"
action_id = "3e7982d5-fcba-4e21-9732-82be35f40669" # Action ID for: Generate Fish Speech
# Construct the exact input payload based on the action's requirements
# This example uses the predefined example_input for this action:
payload = {
"text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
"textReference": "and keeping eternity before the eyes, though much",
"speakerReference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json",
# Add any other required headers for the Cognitive Actions API
}
# Prepare the request body for the hypothetical execution endpoint
request_body = {
"action_id": action_id,
"inputs": payload
}
print(f"--- Calling Cognitive Action: {action.name or action_id} ---")
print(f"Endpoint: {COGNITIVE_ACTIONS_EXECUTE_URL}")
print(f"Action ID: {action_id}")
print("Payload being sent:")
print(json.dumps(request_body, indent=2))
print("------------------------------------------------")
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json=request_body
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully. Result:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body (non-JSON): {e.response.text}")
print("------------------------------------------------")
Conclusion
Fishspeech 1.4 empowers developers to create engaging audio content by transforming text into lifelike speech that reflects the unique characteristics of different speakers. This action not only enhances user experiences across various applications but also broadens accessibility options for diverse audiences. As you explore the potential of Fishspeech 1.4, consider integrating it into your projects to deliver personalized audio that captivates and informs users. Embrace the future of content delivery with Fishspeech 1.4 and elevate your application's capabilities today.