Transform Your Text into Realistic Speech with Llasa 3b Long

27 Apr 2025
Transform Your Text into Realistic Speech with Llasa 3b Long

The Llasa 3b Long service offers cutting-edge voice cloning and text-to-speech synthesis capabilities, providing developers with the tools to create lifelike speech from text. This service harnesses a state-of-the-art model that utilizes a massive dataset of 250,000 hours of speech, enabling it to generate high-quality audio outputs in both English and Chinese. By simplifying the process of speech synthesis, Llasa 3b Long allows you to integrate voice technology into applications seamlessly, enhancing user experiences and making interactions more engaging.

Common use cases for this service include creating personalized voice assistants, developing audiobooks with distinct voices, generating voiceovers for videos, or even enabling accessibility features for the visually impaired. Whether you're building an application that requires realistic voice interaction or simply looking to add a unique auditory component to your project, Llasa 3b Long provides the flexibility and quality you need.

Perform Voice Cloning and Text-to-Speech Synthesis

This action allows you to execute advanced zero-shot voice cloning and text-to-speech synthesis, transforming written text into natural-sounding speech. By utilizing a provided voice sample, the system modulates the generated speech to closely match the desired voice characteristics, making it ideal for applications needing a personalized touch.

Input Requirements

To use this action, you need to provide the following inputs:

  • text: The primary text that will be converted into speech. It is essential to ensure clarity and coherence for optimal synthesis.
  • voiceSample: A URI pointing to a 16kHz audio file that serves as the voice sample, which helps in modulating the synthesized speech.
  • promptText (optional): Additional context for the speech synthesis. If not provided, context is derived from the voice sample.
  • chunkLength: This specifies the length of text chunks for processing, with a default of 250 characters. Adjusting this can help in maintaining sentence coherence.

Example Input:

{
  "text": "I must not fear. Fear is the mind-killer...",
  "promptText": "You open your eyes so that only a slender chink of light seeps in...",
  "chunkLength": 200,
  "voiceSample": "https://replicate.delivery/pbxt/MNaHFqDkZ0Y22hvppxotJazhRYe6TwhK78xAUTCoz3NB9bRV/voice_sample.wav"
}

Expected Output

The output will be a URL linking to the generated audio file, which contains the synthesized speech based on the provided text and voice sample.

Example Output:

https://assets.cognitiveactions.com/invocations/668c9ede-e84f-4f92-bacb-145d62f309f3/340b3cad-fbcf-4e22-b8fa-c6df37261c8b.wav

Use Cases for this Action

  • Personalized Voice Assistants: Create a unique voice for chatbots or virtual assistants that resonate with your brand.
  • Audiobook Production: Generate lifelike narrations for books, allowing authors to provide an audio version of their work without hiring voice actors.
  • Multimedia Content: Enhance videos and presentations with voiceovers that match the tone and style of the content.
  • Accessibility Solutions: Develop applications that read text aloud for visually impaired users, improving their interaction with digital content.
import requests
import json

# Replace with your actual Cognitive Actions API key and endpoint
# Ensure your environment securely handles the API key
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
# This endpoint URL is hypothetical and should be documented for users
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"

action_id = "012af778-4476-43e6-aa05-73e0b53256a7" # Action ID for: Perform Voice Cloning and Text-to-Speech Synthesis

# Construct the exact input payload based on the action's requirements
# This example uses the predefined example_input for this action:
payload = {
  "text": "I must not fear. Fear is the mind-killer. Fear is the little-death that brings total obliteration. I will face my fear. I will permit it to pass over me and through me. And when it has gone past I will turn the inner eye to see its path. Where the fear has gone there will be nothing. Only I will remain.",
  "promptText": "You open your eyes so that only a slender chink of light seeps in, and peer at the gingko trees in front of the Provincial Office. As though there, between those branches, the wind is about to take on visible form.",
  "chunkLength": 200,
  "voiceSample": "https://replicate.delivery/pbxt/MNaHFqDkZ0Y22hvppxotJazhRYe6TwhK78xAUTCoz3NB9bRV/voice_sample.wav"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json",
    # Add any other required headers for the Cognitive Actions API
}

# Prepare the request body for the hypothetical execution endpoint
request_body = {
    "action_id": action_id,
    "inputs": payload
}

print(f"--- Calling Cognitive Action: {action.name or action_id} ---")
print(f"Endpoint: {COGNITIVE_ACTIONS_EXECUTE_URL}")
print(f"Action ID: {action_id}")
print("Payload being sent:")
print(json.dumps(request_body, indent=2))
print("------------------------------------------------")

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json=request_body
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully. Result:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body (non-JSON): {e.response.text}")
    print("------------------------------------------------")

Conclusion

The Llasa 3b Long service offers developers powerful tools for voice cloning and text-to-speech synthesis, enabling a new level of interactivity and personalization in applications. With its ability to produce high-quality speech from text, it opens doors to innovative use cases across various industries. By integrating these capabilities into your projects, you can significantly enhance user engagement and satisfaction. Start exploring how Llasa 3b Long can transform your applications today!