Seamlessly Translate Audio with Style Preservation Using Cognitive Actions

25 Apr 2025
Seamlessly Translate Audio with Style Preservation Using Cognitive Actions

In today's globalized world, the ability to translate audio content while maintaining its original style, pronunciation, and tone is crucial for effective communication. The cuuupid/seamless_expressive API offers a powerful Cognitive Action: Translate Audio with Style Preservation. This action allows developers to enhance their applications by seamlessly translating audio into different languages while preserving the essence of the original audio.

Prerequisites

Before diving into the integration of Cognitive Actions, ensure you have the following:

  • An API key for the Cognitive Actions platform, which you will use for authentication.
  • A valid endpoint URL to execute the Cognitive Actions.

For authentication, you typically pass your API key in the headers of your API requests.

Cognitive Actions Overview

Translate Audio with Style Preservation

This action translates audio content while maintaining the original style, pronunciation, and tone of the source audio. It falls under the speech-to-text category and is particularly useful for applications that require high-quality audio translations.

Input

The input for this action requires the following fields based on the input schema:

  • audioInput (string, required): URI of the input audio file in its original language. Ensure the link is accessible and valid.
  • durationFactor (number, optional): Duration adjustment factor for output audio. Defaults to 1.0. Recommended values:
    • 1.0 for English, Mandarin, Spanish
    • 1.1 for German
    • 1.2 for French
  • sourceLanguage (string, optional): The language in which the input audio is originally recorded. Defaults to English.
  • targetLanguage (string, optional): The language for translation output audio. Defaults to French.

Example Input:

{
    "audioInput": "https://replicate.delivery/pbxt/K08cm1YdIpqwlMi6U8qE4nSgG9gGHs26x5VirD1GkVHUC7SB/sample.mp3",
    "durationFactor": 1,
    "sourceLanguage": "English",
    "targetLanguage": "Spanish"
}

Output

The output typically contains:

  • text_out (string): The translated text of the audio content.
  • audio_out (string): URI of the translated audio file.

Example Output:

{
    "text_out": "Por favor, mantén el volumen bajo. Acabamos de dormir al bebé.",
    "audio_out": "https://assets.cognitiveactions.com/invocations/f4099e93-6e8b-4d3b-b9ca-1ead5573dcc2/cdbee963-0d67-434f-b61b-3f0bba2c09bd.wav"
}

Conceptual Usage Example (Python)

Here's how you might structure the input and call the Cognitive Actions endpoint using Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "c332a651-49ec-416f-afaa-958295a2b59a" # Action ID for Translate Audio with Style Preservation

# Construct the input payload based on the action's requirements
payload = {
    "audioInput": "https://replicate.delivery/pbxt/K08cm1YdIpqwlMi6U8qE4nSgG9gGHs26x5VirD1GkVHUC7SB/sample.mp3",
    "durationFactor": 1,
    "sourceLanguage": "English",
    "targetLanguage": "Spanish"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The action_id variable corresponds to the action being executed. The payload variable contains the structured input needed for the translation.

Conclusion

The Translate Audio with Style Preservation action provides a robust solution for translating audio while keeping the original style intact, making it an invaluable tool for developers looking to enhance their applications. With just a few lines of code, you can integrate this powerful functionality into your projects. Consider exploring additional use cases, such as real-time translation or integrating with user-generated content, to maximize the impact of this action in your applications.