Create Emotionally Engaging Audio from Text with High Fidelity Synthesis

6 May 2025

In today's digital landscape, the ability to transform written content into high-quality audio has become essential for many applications. The High Fidelity Text To Audio Synthesis API offers developers a powerful tool for converting text into emotionally expressive audio. This service utilizes advanced voice synthesis technology to deliver high-fidelity audio that can enhance user engagement across various platforms.

By integrating this API, developers can simplify the process of creating voiceovers for video content, audiobooks, or even real-time applications that require a human touch. The API's multilingual capabilities allow it to cater to a global audience, making it a versatile choice for applications needing diverse language support. Whether you're looking to create engaging educational content or immersive storytelling experiences, this API can help bring your text to life.

Prerequisites

To get started, you'll need an API key for the Cognitive Actions service and a basic understanding of how to make API calls.

Convert Text to Emotionally Expressive Audio

The primary action offered by the High Fidelity Text To Audio Synthesis API is the ability to convert text into audio that conveys emotion and nuance.

Purpose

This action transforms text into high-fidelity audio with voice synthesis that includes emotional expression and multilingual capabilities. It is ideal for applications such as voiceovers, audiobooks, and real-time interactions where emotional depth and language versatility are crucial.

Input Requirements

The input is structured as a JSON object and includes the following fields:

text: The text you want to convert to speech (maximum 5000 characters).
pitch: An integer to adjust the pitch of the speech, ranging from -12 to 12.
speed: A number to control the speech speed, from 0.5 (half-speed) to 2 (double speed).
volume: A number to set the volume level, from 0 (mute) to 10 (maximum).
bitRate: Specifies the bitrate for the output speech (options: 32000, 64000, 128000, or 256000).
emotion: Selects the emotional tone of the speech (options include happy, sad, angry, etc.).
audioChannel: Determines whether the audio is mono or stereo.
audioSampleRate: Sets the sample rate for the output speech (options: 8000, 16000, 22050, 24000, 32000, or 44100).
voiceIdentifier: Specifies the desired voice ID.
languageEnhancement: Enhances speech recognition for specific languages.
englishNormalization: Activates English text normalization for better reading of numbers.

Example Input

{
  "text": "Speech-02-series is a Text-to-Audio and voice cloning technology that offers voice synthesis, emotional expression, and multilingual capabilities.",
  "pitch": 0,
  "speed": 1,
  "volume": 1,
  "bitRate": 128000,
  "emotion": "happy",
  "audioChannel": "mono",
  "audioSampleRate": 32000,
  "voiceIdentifier": "Friendly_Person",
  "languageEnhancement": "English",
  "englishNormalization": true
}

Expected Output

The output will be a URL link to the generated audio file, allowing easy playback and integration into your application.

Example Output

https://assets.cognitiveactions.com/invocations/7a407a5e-35f6-4fae-a8e3-ca1ef9381cc0/5818cac2-16d7-43b1-bcaa-e1a25900af9c.mp3

Use Cases for this Action

Voiceovers: Perfect for creating engaging voiceovers for videos and presentations.
Audiobooks: Transform written content into captivating audiobooks that retain the emotional depth of the narrative.
Real-time Applications: Utilize in chatbots or virtual assistants that require a natural and expressive voice to enhance user interaction.
Educational Tools: Enhance learning experiences by converting educational materials into audio, making them more accessible and engaging.


```python
import requests
import json

# Replace with your actual Cognitive Actions API key and endpoint
# Ensure your environment securely handles the API key
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
# This endpoint URL is hypothetical and should be documented for users
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"

action_id = "e8031e09-0c9e-4885-91aa-253d58f397b8" # Action ID for: Convert Text to Emotionally Expressive Audio

# Construct the exact input payload based on the action's requirements
# This example uses the predefined example_input for this action:
payload = {
  "text": "Speech-02-series is a Text-to-Audio and voice cloning technology that offers voice synthesis, emotional expression, and multilingual capabilities.\n\nThe HD version is optimized for high-fidelity applications like voiceovers and audiobooks. While the turbo one is designed for real-time applications with low latency.\n\nWhen using this model on Replicate, each character represents 1 token.",
  "pitch": 0,
  "speed": 1,
  "volume": 1,
  "bitRate": 128000,
  "emotion": "happy",
  "audioChannel": "mono",
  "audioSampleRate": 32000,
  "voiceIdentifier": "Friendly_Person",
  "languageEnhancement": "English",
  "englishNormalization": true
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json",
    # Add any other required headers for the Cognitive Actions API
}

# Prepare the request body for the hypothetical execution endpoint
request_body = {
    "action_id": action_id,
    "inputs": payload
}

print(f"--- Calling Cognitive Action: {action.name or action_id} ---")
print(f"Endpoint: {COGNITIVE_ACTIONS_EXECUTE_URL}")
print(f"Action ID: {action_id}")
print("Payload being sent:")
print(json.dumps(request_body, indent=2))
print("------------------------------------------------")

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json=request_body
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully. Result:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body (non-JSON): {e.response.text}")
    print("------------------------------------------------")


## Conclusion

The High Fidelity Text To Audio Synthesis API empowers developers to create rich, emotionally engaging audio experiences from text. With its versatile features and high-quality output, this API is a valuable addition to any application requiring voice synthesis. By leveraging its capabilities, you can enhance user engagement, improve accessibility, and bring your text to life in new and exciting ways. 

Explore the possibilities of audio synthesis with this API and take your projects to the next level!