Create Natural-Sounding Speech with Amphion Naturalspeech2

26 Apr 2025
Create Natural-Sounding Speech with Amphion Naturalspeech2

Amphion Naturalspeech2 offers developers a powerful tool for generating lifelike speech and singing from text inputs. By leveraging advanced latent diffusion models, this service ensures that the synthesized audio is not only accurate but also expressive, making it an ideal choice for applications where natural-sounding voice output is crucial. Whether you're developing voice assistants, creating audiobooks, or enhancing multimedia content, Naturalspeech2 simplifies the process of transforming text into engaging audio.

Use Cases

  • Voice Assistants: Integrate Naturalspeech2 to provide users with a more engaging and human-like interaction.
  • Audiobook Creation: Easily convert written stories into captivating spoken narratives.
  • Content Localization: Generate voiceovers in multiple languages while maintaining a natural tone.
  • Educational Tools: Create interactive learning experiences by providing spoken instructions and explanations.

Prerequisites

To get started with Amphion Naturalspeech2, you'll need a Cognitive Actions API key and a basic understanding of how to make API calls.

Generate Speech with NaturalSpeech2

The "Generate Speech with NaturalSpeech2" action is designed to synthesize natural-sounding speech and singing from text input. This action solves the problem of producing high-quality audio outputs that resonate with listeners, making it suitable for various applications where voice output is necessary.

Input Requirements: The action requires two inputs:

  1. text: The text content to be converted into speech. For example, "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good."
  2. speakerReference: A URI reference to an audio example of the speaker's voice. This ensures that the output matches the desired vocal characteristics. For example, a valid URI might look like "https://replicate.delivery/pbxt/MN9oVKrayGMTWkp7zLYiFI4f2MxcvUNXdLPZKNm2XF6pfFCd/example.wav".

Expected Output: The output of this action is a URL to the generated audio file, which contains the synthesized speech. An example output could be "https://assets.cognitiveactions.com/invocations/1a0811be-81e3-4728-9fb3-70f66ec4717a/d7d6f73c-2620-4c2b-b047-89ed7324162a.wav".

Use Cases for this specific action:

  • Interactive Voice Response Systems: Enhance customer service applications with realistic voice responses.
  • Entertainment Applications: Create character voices for games or animations that sound authentic and relatable.
  • Marketing Content: Produce dynamic advertisements that engage listeners with a natural-sounding voiceover.
import requests
import json

# Replace with your actual Cognitive Actions API key and endpoint
# Ensure your environment securely handles the API key
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
# This endpoint URL is hypothetical and should be documented for users
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"

action_id = "4d4214b4-509e-474b-9183-8b777e160486" # Action ID for: Generate Speech with NaturalSpeech2

# Construct the exact input payload based on the action's requirements
# This example uses the predefined example_input for this action:
payload = {
  "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
  "speakerReference": "https://replicate.delivery/pbxt/MN9oVKrayGMTWkp7zLYiFI4f2MxcvUNXdLPZKNm2XF6pfFCd/example.wav"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json",
    # Add any other required headers for the Cognitive Actions API
}

# Prepare the request body for the hypothetical execution endpoint
request_body = {
    "action_id": action_id,
    "inputs": payload
}

print(f"--- Calling Cognitive Action: {action.name or action_id} ---")
print(f"Endpoint: {COGNITIVE_ACTIONS_EXECUTE_URL}")
print(f"Action ID: {action_id}")
print("Payload being sent:")
print(json.dumps(request_body, indent=2))
print("------------------------------------------------")

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json=request_body
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully. Result:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body (non-JSON): {e.response.text}")
    print("------------------------------------------------")

Conclusion

Amphion Naturalspeech2 empowers developers to create applications that require high-quality audio outputs with ease. Its ability to synthesize natural-sounding speech and singing opens up numerous possibilities across various domains, from entertainment to customer service. By integrating this powerful tool, you can enhance user experiences and create more engaging content. As a next step, consider experimenting with different text inputs and speaker references to see how Naturalspeech2 can elevate your projects.