Transform Text into Natural Speech with Fishspeech 1.5

26 Apr 2025
Transform Text into Natural Speech with Fishspeech 1.5

In today's digital landscape, the ability to convert text into natural-sounding speech can greatly enhance user engagement and accessibility. Fishspeech 1.5 provides developers with a powerful text-to-speech solution that leverages the Fish Speech V1.5 model. This unofficial implementation offers improved quality, speed, and accuracy, making it an ideal choice for applications ranging from virtual assistants to e-learning platforms.

Imagine integrating a voice into your applications that can articulate complex texts clearly and engagingly. Whether you're creating audiobooks, voiceovers for videos, or enhancing accessibility features for users with visual impairments, Fishspeech 1.5 simplifies the process of bringing textual content to life.

Prerequisites

To get started with Fishspeech 1.5, you'll need an API key for Cognitive Actions and a basic understanding of making API calls.

Generate Speech with Fish Speech V1.5

The "Generate Speech with Fish Speech V1.5" action allows you to convert textual content into spoken words using advanced speech synthesis technology. This action is designed to solve the common issue of monotonous or robotic text-to-speech outputs, providing a more human-like voice experience.

Input Requirements: To use this action, you need to provide the following:

  • text: The primary content you want to convert into speech (e.g., "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.").
  • textReference: A segment of text that offers clarification or context (e.g., "and keeping eternity before the eyes, though much").
  • speakerReference: A URI pointing to the speaker's audio reference, which should be a fully qualified URL (e.g., "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav").

Expected Output: The output will be a link to an audio file containing the generated speech. For example, you may receive an output link like "https://assets.cognitiveactions.com/invocations/b6c1c420-6d24-448c-81e3-b7c697c418c3/2fbb35cf-d1ab-48e6-91fb-331e18a755a8.wav".

Use Cases for this specific action:

  1. E-Learning Platforms: Enhance educational content by providing spoken versions of lessons, making learning more accessible.
  2. Audiobook Creation: Automatically generate audio versions of written content, allowing authors to reach wider audiences through audio formats.
  3. Virtual Assistants: Improve user interaction by providing a natural-sounding voice for chatbots and virtual assistants.
  4. Accessibility Features: Support users with visual impairments by converting written text into spoken format, ensuring everyone can access information.
import requests
import json

# Replace with your actual Cognitive Actions API key and endpoint
# Ensure your environment securely handles the API key
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
# This endpoint URL is hypothetical and should be documented for users
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"

action_id = "2e73e658-daca-4a4e-b44a-f18f2f89cd38" # Action ID for: Generate Speech with Fish Speech V1.5

# Construct the exact input payload based on the action's requirements
# This example uses the predefined example_input for this action:
payload = {
  "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
  "textReference": "and keeping eternity before the eyes, though much",
  "speakerReference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json",
    # Add any other required headers for the Cognitive Actions API
}

# Prepare the request body for the hypothetical execution endpoint
request_body = {
    "action_id": action_id,
    "inputs": payload
}

print(f"--- Calling Cognitive Action: {action.name or action_id} ---")
print(f"Endpoint: {COGNITIVE_ACTIONS_EXECUTE_URL}")
print(f"Action ID: {action_id}")
print("Payload being sent:")
print(json.dumps(request_body, indent=2))
print("------------------------------------------------")

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json=request_body
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully. Result:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body (non-JSON): {e.response.text}")
    print("------------------------------------------------")

Conclusion

Fishspeech 1.5 empowers developers to easily integrate high-quality text-to-speech capabilities into their applications. With its natural-sounding voice and efficient processing, you can enhance user experience, improve accessibility, and create engaging content. As you consider implementing Fishspeech 1.5, think about the various applications—be it in education, entertainment, or accessibility—and how it can transform the way users interact with your content. Start exploring the possibilities today!