Unlocking Music Creativity with Demucs: Source Separation Made Simple

26 Apr 2025

In the world of audio processing, the ability to separate musical components from a mixed track is a game-changer for developers and music enthusiasts alike. Demucs offers a powerful Cognitive Action that achieves state-of-the-art music source separation, allowing you to isolate elements such as drums, bass, and vocals with remarkable accuracy. By leveraging a Hybrid Transformer architecture, Demucs enhances the quality of separation, making it an invaluable tool for various applications ranging from music production to content creation.

Imagine a scenario where you want to remix a song but only need the vocal track. Or perhaps you’re developing an application that analyzes music compositions by breaking down their components. Demucs simplifies these processes, enabling developers to integrate advanced audio manipulation capabilities into their projects swiftly and efficiently.

Prerequisites

To get started with Demucs, you'll need a Cognitive Actions API key and a basic understanding of making API calls. This will allow you to seamlessly interact with the source separation features.

Perform Music Source Separation

The Perform Music Source Separation action is designed to isolate specific audio stems from a mixed track. This action solves the problem of extracting individual components from complex audio files, allowing for enhanced manipulation or analysis of music.

Input Requirements

To utilize this action, you must provide the following input:

Audio: A URI pointing to the input audio file (e.g., an MP3 file).
Stem: Specify which audio stem to separate (e.g., vocals, bass, drums).
Shifts: Defines the number of random shifts for stabilization; a higher value can improve separation quality.
Float32: Indicates whether to save the output in float32 format.
Overlap: Specifies the overlap ratio between audio splits, affecting separation quality.
Clip Mode: Determines how to handle audio signal clipping (rescale or clamp).
Model Name: Selects the model for processing (e.g., htdemucs).
MP3 Bitrate: Defines the bitrate for the output MP3 file, impacting audio quality.
Output Format: The desired format for the output audio file (mp3, wav, or flac).

Example Input

{
  "audio": "https://replicate.delivery/pbxt/J6Quo9VPU210JJB9HS97ThWUxT7iax8PWiP7FD5f3bg2G6AY/test1.mp3",
  "shifts": 1,
  "overlap": 0.25,
  "clipMode": "rescale",
  "modelName": "htdemucs",
  "mp3Bitrate": 320,
  "outputFormat": "mp3"
}

Expected Output

The output will consist of separate audio files for each specified stem, depending on the input parameters. For example, you may receive:

Vocals: A link to the isolated vocal track.
Bass: A link to the isolated bass track.
Drums: A link to the isolated drum track.
Other Instruments: Links or null values for other specified instruments.

Example Output

{
  "bass": "https://assets.cognitiveactions.com/invocations/c65d3679-a78f-47d7-80f7-4a6844ea4ca3/a6a4849a-3c57-4051-9daa-15524a057409.mp3",
  "drums": "https://assets.cognitiveactions.com/invocations/c65d3679-a78f-47d7-80f7-4a6844ea4ca3/0eb64501-80b8-457c-95be-133b14d090a4.mp3",
  "other": "https://assets.cognitiveactions.com/invocations/c65d3679-a78f-47d7-80f7-4a6844ea4ca3/dbaf59fc-e096-4bad-b566-2937ea9b9602.mp3",
  "piano": null,
  "guitar": null,
  "vocals": "https://assets.cognitiveactions.com/invocations/c65d3679-a78f-47d7-80f7-4a6844ea4ca3/cad8cb89-6f4e-4a8c-b513-830d1d8aa059.mp3"
}

import requests
import json

# Replace with your actual Cognitive Actions API key and endpoint
# Ensure your environment securely handles the API key
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
# This endpoint URL is hypothetical and should be documented for users
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"

action_id = "dab844e3-47bd-4578-9ed7-a770bfa65e38" # Action ID for: Perform Music Source Separation

# Construct the exact input payload based on the action's requirements
# This example uses the predefined example_input for this action:
payload = {
  "audio": "https://replicate.delivery/pbxt/J6Quo9VPU210JJB9HS97ThWUxT7iax8PWiP7FD5f3bg2G6AY/test1.mp3",
  "shifts": 1,
  "overlap": 0.25,
  "clipMode": "rescale",
  "modelName": "htdemucs",
  "mp3Bitrate": 320,
  "outputFormat": "mp3"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json",
    # Add any other required headers for the Cognitive Actions API
}

# Prepare the request body for the hypothetical execution endpoint
request_body = {
    "action_id": action_id,
    "inputs": payload
}

print(f"--- Calling Cognitive Action: {action.name or action_id} ---")
print(f"Endpoint: {COGNITIVE_ACTIONS_EXECUTE_URL}")
print(f"Action ID: {action_id}")
print("Payload being sent:")
print(json.dumps(request_body, indent=2))
print("------------------------------------------------")

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json=request_body
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully. Result:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body (non-JSON): {e.response.text}")
    print("------------------------------------------------")

Use Cases for this Action

Music Remixing: Easily isolate and remix individual components of a song, allowing for creative reinterpretations.
Audio Analysis: Analyze specific elements of a track for music research or educational purposes.
Content Creation: Developers can create applications that require specific audio components for video production or interactive media.

Conclusion

The Demucs music source separation action empowers developers by simplifying the process of isolating audio components from mixed tracks. This capability opens up a wealth of possibilities for music production, analysis, and content creation. With its high accuracy and ease of integration, Demucs is a must-have tool for anyone looking to enhance their audio processing applications.

Consider exploring additional features and capabilities of Demucs to further enrich your audio projects!