Create Unique Music Tracks Using Text Prompts with Musicgen

26 Apr 2025
Create Unique Music Tracks Using Text Prompts with Musicgen

Musicgen is an innovative service that allows developers to generate custom music tracks using text prompts. By leveraging advanced AI technology, Musicgen simplifies the music creation process, enabling users to produce unique compositions tailored to their specifications. The ability to define parameters such as duration, temperature, and output format means that developers can create diverse audio experiences efficiently, whether for games, videos, or personal projects.

Prerequisites

To integrate Musicgen into your projects, you will need access to a Cognitive Actions API key and a basic understanding of making API calls.

Generate Music with Text Prompt

The "Generate Music with Text Prompt" action is designed to create music based on a descriptive text input. This action solves the problem of limited creativity in music production by allowing users to specify the style and characteristics of the music they want to generate.

Input Requirements

  • Seed: An optional integer to set a random seed for the generation process.
  • Prompt: A string describing the desired music, e.g., "produce a song melody in jog raga indian style."
  • Weights: Optional specification of MusicGen weights for generation.
  • Duration: An integer indicating how long the generated audio should be in seconds (default is 8 seconds).
  • Max Tokens: An integer that limits the sampling to the k most likely tokens (default is 250).
  • Audio Input: A URI to an audio file that influences the generated music.
  • Continuation: A boolean indicating if the generated music should continue from the provided audio file.
  • Continuation End/Start: Integers that define the time range for the continuation of the audio.
  • Audio Output Format: Specifies the format of the output audio file (options are 'wav' or 'mp3', default is 'wav').
  • Sampling Temperature: A number that controls the diversity of the output (default is 1).
  • Probability Threshold: A number that limits the sampling to tokens with a cumulative probability up to 'p'.
  • Enable MultiBand Diffusion: A boolean option to use MultiBand Diffusion for decoding.
  • Audio Normalization Strategy: Specifies the method for audio normalization (default is 'loudness').

Expected Output

The output is a generated audio file in the specified format, such as a WAV file, that reflects the input prompt and parameters.

Use Cases for this Action

  • Content Creation: Ideal for video editors and content creators looking to produce background music or soundtracks that fit specific themes or moods.
  • Game Development: Useful for developers needing original music that enhances the gaming experience, tailored to different scenes or levels.
  • Music Experimentation: Musicians can explore new styles and compositions by inputting creative prompts that inspire fresh musical ideas.
import requests
import json

# Replace with your actual Cognitive Actions API key and endpoint
# Ensure your environment securely handles the API key
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
# This endpoint URL is hypothetical and should be documented for users
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"

action_id = "bc453217-ea46-490f-8f74-f0b752994b6c" # Action ID for: Generate Music with Text Prompt

# Construct the exact input payload based on the action's requirements
# This example uses the predefined example_input for this action:
payload = {
  "prompt": "produce a song melody in jog raga indian style",
  "duration": 60,
  "maxTokens": 250,
  "continuation": false,
  "guidanceScale": 3,
  "continuationEnd": 0,
  "audioOutputFormat": "wav",
  "continuationStart": 0,
  "samplingTemperature": 0.9,
  "probabilityThreshold": 0,
  "enableMultiBandDiffusion": false,
  "audioNormalizationStrategy": "rms"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json",
    # Add any other required headers for the Cognitive Actions API
}

# Prepare the request body for the hypothetical execution endpoint
request_body = {
    "action_id": action_id,
    "inputs": payload
}

print(f"--- Calling Cognitive Action: {action.name or action_id} ---")
print(f"Endpoint: {COGNITIVE_ACTIONS_EXECUTE_URL}")
print(f"Action ID: {action_id}")
print("Payload being sent:")
print(json.dumps(request_body, indent=2))
print("------------------------------------------------")

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json=request_body
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully. Result:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body (non-JSON): {e.response.text}")
    print("------------------------------------------------")

Conclusion

With Musicgen's ability to generate music from text prompts, developers can unlock new creative possibilities in audio production. Whether for enhancing multimedia projects, building immersive game environments, or simply exploring musical creativity, Musicgen offers a powerful tool to streamline the music creation process. To get started, integrate the "Generate Music with Text Prompt" action into your applications and begin crafting unique soundscapes today!