Create Stunning Waveform Videos from Audio with Cognitive Actions

21 Apr 2025
Create Stunning Waveform Videos from Audio with Cognitive Actions

In the realm of multimedia applications, the ability to transform audio files into visual formats can enhance user engagement and provide a richer experience. The fofr/audio-to-waveform API offers a powerful Cognitive Action that allows developers to convert audio files into visually appealing waveform videos. This integration not only facilitates the visualization of sound but also offers customization options to match the aesthetic of your application.

Prerequisites

Before diving into the implementation, ensure you have the following:

  • An API key for the Cognitive Actions platform which will be used for authentication.
  • Basic knowledge of JSON structure as you'll be sending and receiving JSON data.
  • A programming environment set up to make HTTP requests (e.g., Python with the requests library).

To authenticate your requests, you typically pass your API key in the request headers.

Cognitive Actions Overview

Create Waveform Video from Audio

Description: Convert audio files into visually appealing waveform videos using Gradio's make_waveform tool. Customize waveform appearance with options for bar width, color, background, and caption.

Category: Video Processing

Input

The input for this action requires a JSON payload structured as follows:

{
  "audio": "https://example.com/audio.wav",
  "barWidth": 0.4,
  "barsColor": "#ffffff",
  "captionText": "Your caption here",
  "numberOfBars": 100,
  "backgroundColor": "#000000",
  "foregroundOpacity": 0.75
}
  • audio (string, required): URI of the audio file from which to generate the waveform.
    Example: "https://replicate.delivery/pbxt/J03sz7ye60eaijccxUfU5wc1W9vwgKIsU47QozjClDmi1bgB/20230613T093211825Z_80s_trancecore%2C_driving_rhythm.wav"
  • barWidth (number, optional): Width of each bar in the waveform as a decimal fraction of the total width. Default is 0.4.
  • barsColor (string, optional): Hex color code for the waveform bars. Default is "#ffffff".
  • captionText (string, optional): Text overlay to display as a caption on the video. Default is an empty string.
  • numberOfBars (integer, optional): Total number of bars displayed in the waveform. Default is 100.
  • backgroundColor (string, optional): Hex color code for the waveform's background color. Default is "#000000".
  • foregroundOpacity (number, optional): Opacity level of the foreground waveform, where 1 is fully opaque and 0 is fully transparent. Default is 0.75.

Output

Upon successful execution, the action typically returns a URL to the generated waveform video. For example:

"https://assets.cognitiveactions.com/invocations/63b1268a-95cb-4222-9d5e-9aa4f2f8c77c/28856f8a-0c13-4576-b35c-d11d9c89aac7.mp4"

This URL points to the waveform video that has been created from the specified audio input.

Conceptual Usage Example (Python)

Here’s a conceptual Python code snippet demonstrating how to invoke the "Create Waveform Video from Audio" action:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "a17c33bf-79cb-460d-8897-b85d91701ec5"  # Action ID for Create Waveform Video from Audio

# Construct the input payload based on the action's requirements
payload = {
    "audio": "https://replicate.delivery/pbxt/J03sz7ye60eaijccxUfU5wc1W9vwgKIsU47QozjClDmi1bgB/20230613T093211825Z_80s_trancecore%2C_driving_rhythm.wav",
    "barWidth": 0.4,
    "barsColor": "#ffffff",
    "captionText": "80s trancecore, driving rhythm section, ambient textures, boomwhackers, persian scale mode, tribute recording",
    "numberOfBars": 100,
    "backgroundColor": "#000000",
    "foregroundOpacity": 0.75
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet:

  • Replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key.
  • The payload variable is structured according to the action's input requirements.
  • The endpoint URL and request structure are illustrative and should be adjusted based on your actual setup.

Conclusion

By leveraging the Cognitive Actions available in the fofr/audio-to-waveform API, you can effortlessly transform audio files into dynamic waveform videos that can elevate the visual appeal of your applications. With customizable parameters, the generated videos can be tailored to fit various themes and styles, making them an excellent addition to multimedia content.

Consider exploring more use cases, such as integrating this feature into music applications, video editing tools, or educational platforms to enhance user engagement. Happy coding!