Create Realistic Sound Effects from Text with Tango

27 Apr 2025

In the world of audio processing, the ability to generate sound effects from text descriptions can revolutionize how developers create immersive experiences. Tango offers a powerful Cognitive Action that transforms textual prompts into realistic sound effects, including human voices, animal sounds, and environmental noises. With Tango 2, you can harness state-of-the-art text-to-audio performance that combines speed, quality, and accuracy. This model utilizes a smaller dataset to deliver exceptional results, making it easier than ever to integrate lifelike audio into applications.

Imagine building a game where each character's actions are accompanied by fitting sound effects, or creating a virtual environment that reacts with realistic sounds based on user input. The possibilities are endless, and with Tango, you can simplify the process of audio creation while enhancing user engagement.

Prerequisites

To get started with Tango, you'll need a Cognitive Actions API key and a basic understanding of making API calls. This will allow you to effectively integrate the sound generation capabilities into your projects.

Generate Sound Effects from Text Using Tango 2

This action is designed to convert written prompts into diverse sound effects, addressing the need for quality audio in various applications. Whether you're developing games, simulations, or multimedia content, this action can help you create engaging audio experiences that resonate with users.

Input Requirements: To utilize this action, you will provide a JSON object with the following properties:

model: Choose between "tango2" or "tango2-full" (default is "tango2").
inputPrompt: A string that describes the desired sound sequence (default is "Quiet speech and then an airplane flying away").
guidanceScale: A number that influences the level of guidance in model generation, with higher values providing stronger guidance (default is 3).
inferenceSteps: An integer representing the number of inference steps to perform for more accurate results (default is 100).

Expected Output: The action will return a URL linking to the generated sound effect, allowing you to easily access and use the audio in your projects. For example, you might receive a link like this: https://assets.cognitiveactions.com/invocations/9b2afd50-b9b7-42b2-8c28-a864e278cfe9/0cd511b8-4953-4c99-b8ed-906704a0ef8e.wav.

Use Cases for this specific action:

Game Development: Enhance gameplay by generating dynamic sound effects based on player actions or in-game events.
Virtual Reality Experiences: Create immersive environments that react with realistic audio, increasing user engagement and realism.
Storytelling Applications: Bring narratives to life by generating sound effects that match the storyline, creating a more captivating experience.


```python
import requests
import json

# Replace with your actual Cognitive Actions API key and endpoint
# Ensure your environment securely handles the API key
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
# This endpoint URL is hypothetical and should be documented for users
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"

action_id = "e39ca5bc-1267-470d-9f5b-2738ba246299" # Action ID for: Generate Sound Effects from Text Using Tango 2

# Construct the exact input payload based on the action's requirements
# This example uses the predefined example_input for this action:
payload = {
  "model": "tango2",
  "inputPrompt": "Quiet speech and then and airplane flying away",
  "guidanceScale": 3,
  "inferenceSteps": 100
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json",
    # Add any other required headers for the Cognitive Actions API
}

# Prepare the request body for the hypothetical execution endpoint
request_body = {
    "action_id": action_id,
    "inputs": payload
}

print(f"--- Calling Cognitive Action: {action.name or action_id} ---")
print(f"Endpoint: {COGNITIVE_ACTIONS_EXECUTE_URL}")
print(f"Action ID: {action_id}")
print("Payload being sent:")
print(json.dumps(request_body, indent=2))
print("------------------------------------------------")

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json=request_body
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully. Result:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body (non-JSON): {e.response.text}")
    print("------------------------------------------------")


## Conclusion
With Tango's ability to generate sound effects from text, developers can significantly streamline the audio creation process while enhancing the richness of their applications. Whether you're building games, simulations, or interactive media, this Cognitive Action provides a valuable tool for creating engaging audio experiences. Consider integrating Tango into your projects to explore the endless possibilities of text-to-audio generation and elevate your development capabilities.