Transform Your Text into Speech with Fish Speech

26 Apr 2025
Transform Your Text into Speech with Fish Speech

In today's digital landscape, the ability to convert text into natural-sounding speech can significantly enhance user experience across various applications. Introducing Fish Speech, an advanced text-to-speech service that utilizes the latest technology to synthesize speech with remarkable accuracy. With features like zero-shot and few-shot capabilities, along with excellent bilingual support for both Chinese and English, Fish Speech enables developers to create applications that resonate with users in multiple languages.

Whether you're developing an educational tool, a voice assistant, or an interactive game, Fish Speech can be the perfect solution to bring your text content to life. It simplifies the process of integrating speech synthesis into your applications, allowing you to focus on creating engaging user experiences.

Prerequisites

To get started with Fish Speech, you'll need a Cognitive Actions API key and a basic understanding of making API calls.

Generate Speech Using Fish Speech V1.5

Purpose

The "Generate Speech Using Fish Speech V1.5" action provides developers with the ability to convert text into high-quality speech audio. It is designed for scenarios where accurate pronunciation and natural intonation are crucial, making it ideal for applications that require clear and expressive voice outputs.

Input Requirements

The action requires a JSON object that must include the following properties:

  • text: The text content you want to convert into speech. (Example: "我的猫猫就是全世界最好的猫")
  • useCompile: A boolean indicating whether to use compilation optimization (default is true).
  • referenceText (optional): Text that corresponds to a reference audio.
  • referenceAudio (optional): A URI pointing to a reference audio file.

Expected Output

Upon successful execution, the action returns a URL linking to the generated audio file in WAV format, which can be played or integrated into your application. For example, the output might look like this: "https://assets.cognitiveactions.com/invocations/e383d268-3287-432f-b417-70de05b57247/3019038b-8024-4526-8da0-80726b75aac7.wav".

Use Cases for this specific action

  • Educational Applications: Enhance learning experiences by providing spoken feedback or readings of text materials.
  • Voice Assistants: Create more engaging interactions by enabling your assistant to read out responses in a natural voice.
  • Multilingual Platforms: Support users from different linguistic backgrounds by delivering content in their preferred language with accurate pronunciation.
  • Games and Interactive Media: Bring characters to life by adding realistic voiceovers to enhance storytelling and immersion.
import requests
import json

# Replace with your actual Cognitive Actions API key and endpoint
# Ensure your environment securely handles the API key
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
# This endpoint URL is hypothetical and should be documented for users
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"

action_id = "506a5a36-9116-4fc1-a036-b63fa49d1f47" # Action ID for: Generate Speech Using Fish Speech V1.5

# Construct the exact input payload based on the action's requirements
# This example uses the predefined example_input for this action:
payload = {
  "text": "我的猫猫就是全世界最好的猫",
  "useCompile": true
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json",
    # Add any other required headers for the Cognitive Actions API
}

# Prepare the request body for the hypothetical execution endpoint
request_body = {
    "action_id": action_id,
    "inputs": payload
}

print(f"--- Calling Cognitive Action: {action.name or action_id} ---")
print(f"Endpoint: {COGNITIVE_ACTIONS_EXECUTE_URL}")
print(f"Action ID: {action_id}")
print("Payload being sent:")
print(json.dumps(request_body, indent=2))
print("------------------------------------------------")

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json=request_body
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully. Result:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body (non-JSON): {e.response.text}")
    print("------------------------------------------------")

Conclusion

Fish Speech offers a powerful solution for developers looking to integrate text-to-speech capabilities into their applications. With its advanced features and support for multiple languages, it opens up a world of possibilities for creating engaging and interactive user experiences. Whether you're building educational tools, voice assistants, or multilingual applications, Fish Speech can help you achieve your goals with ease. As you explore this service, consider how speech synthesis can enhance your projects and connect with users on a deeper level.