Transform Your Videos with Audio: Integrating MMAudio Cognitive Actions

24 Apr 2025
Transform Your Videos with Audio: Integrating MMAudio Cognitive Actions

In today's digital landscape, enhancing video content with contextually relevant audio can significantly elevate user engagement and experience. The MMAudio Cognitive Actions provide developers with powerful tools to synthesize high-quality audio from video content. By using these pre-built actions, you can automate the audio generation process, ensuring real-time synchronization with video elements and producing appropriate environmental sounds.

Prerequisites

To get started with the MMAudio Cognitive Actions, you'll need an API key for the Cognitive Actions platform. This key allows you to authenticate your requests. Generally, authentication is done by including the API key in the request headers as a Bearer token. Make sure to set up your environment accordingly to facilitate seamless API calls.

Cognitive Actions Overview

Add Sound to Video

The Add Sound to Video action leverages the MMAudio V2 model to synthesize high-quality audio from video content. This action excels in transforming visual content into contextually matching audio, ensuring that the generated sounds align perfectly with the video elements.

  • Category: Video-to-Audio Synthesis

Input

The input schema for this action consists of several properties that allow you to customize the audio generation process:

  • seed (integer, optional): Sets the random seed. Use -1 or omit for a randomized seed.
    Example: -1
  • image (string, optional): URL of an optional image file for experimental image-to-audio generation.
    Example: "https://example.com/image.png"
  • video (string, required): URL of the video file for audio generation.
    Example: "https://huggingface.co/hkchengrex/MMAudio/resolve/main/examples/sora_galloping.mp4"
  • prompt (string, required): The text prompt used to generate audio content.
    Example: "galloping"
  • duration (number, required): Specifies the duration of the generated audio output in seconds. Minimum value is 1.
    Example: 8
  • numberOfSteps (integer, optional): Determines the number of inference steps. Default is 25.
    Example: 25
  • negativePrompt (string, optional): Text prompt outlining sounds to be avoided in the generation process.
    Example: "music"
  • configurationStrength (number, optional): Sets the configuration strength (CFG). Minimum value is 1. Default is 4.5.
    Example: 4.5

Example Input

{
  "seed": -1,
  "video": "https://huggingface.co/hkchengrex/MMAudio/resolve/main/examples/sora_galloping.mp4",
  "prompt": "galloping",
  "duration": 8,
  "numberOfSteps": 25,
  "negativePrompt": "music",
  "configurationStrength": 4.5
}

Output

Upon successfully executing this action, the output will typically return a URL to the generated audio file synchronized with the video.

Example Output:
"https://assets.cognitiveactions.com/invocations/21e0bfbe-3e89-402f-8563-8ba1c4abaedf/daf74719-ee39-46c0-88b9-88a6e61c5aaa.mp4"

Conceptual Usage Example (Python)

Here’s how you can call the Add Sound to Video action using Python. The following code snippet demonstrates how to structure the input JSON payload correctly for this action:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "2cc46a62-e894-427e-8d60-1a8e11b83438" # Action ID for Add Sound to Video

# Construct the input payload based on the action's requirements
payload = {
    "seed": -1,
    "video": "https://huggingface.co/hkchengrex/MMAudio/resolve/main/examples/sora_galloping.mp4",
    "prompt": "galloping",
    "duration": 8,
    "numberOfSteps": 25,
    "negativePrompt": "music",
    "configurationStrength": 4.5
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code, replace "YOUR_COGNITIVE_ACTIONS_API_KEY" with your actual API key. The payload variable is structured according to the required input schema for the Add Sound to Video action.

Conclusion

The MMAudio Cognitive Actions offer a seamless way to enhance your video content with contextually relevant audio, significantly improving viewer engagement. By leveraging the capabilities of the Add Sound to Video action, you can automate the audio generation process and create immersive experiences for your users. Consider exploring various use cases in your applications, from video games to educational content, to fully harness the potential of this powerful tool.