Unlocking Musical Creativity: Integrating Music Generation with Mustango Cognitive Actions

The declare-lab/mustango API provides developers with an innovative toolset for generating music from textual prompts. Using advanced models like Latent Diffusion and Flan-T5, these Cognitive Actions empower applications to create distinct and diverse musical pieces by simply providing descriptive text inputs. This guide will walk you through the capabilities of the "Generate Music from Text Prompt" action and how to integrate it into your applications.
Prerequisites
To get started with the Mustango Cognitive Actions, you will need:
- An API key for the Cognitive Actions platform. This key will be used to authenticate your requests.
- Familiarity with JSON format, as the input and output payloads are structured in this way.
Authentication typically involves including your API key in the request headers, allowing you to securely access the action functionalities.
Cognitive Actions Overview
Generate Music from Text Prompt
The Generate Music from Text Prompt action allows you to create music pieces based on detailed textual descriptions. By defining characteristics such as musical style, instruments, and atmosphere through the prompt, you can generate unique musical pieces that fit various contexts, from soundtracks to advertisements.
Input
The input for this action follows the CompositeRequest schema with the following properties:
- prompt (string): A detailed textual input that guides the content creation.
- Example:
"This techno song features a synth lead playing the main melody. This is accompanied by programmed percussion playing a simple kick focused beat. The hi-hat is accented in an open position on the 3-and count of every bar. The synth plays the bass part with a voicing that sounds like a cello. This techno song can be played in a club. The chord sequence is Gm, A7, Eb, Bb, C, F, Gm. The beat counts to 2. The tempo of this song is 128.0 beats per minute. The key of this song is G minor."
- Example:
- guidanceScale (number): Represents the intensity of guidance during the inference process. Higher values yield stronger adherence to the prompt.
- Default: 3
- Example: 3
- inferenceSteps (integer): The number of steps taken during the inference process, influencing the detail level of the output.
- Default: 100
- Example: 100
Here’s an example JSON input payload:
{
"prompt": "This techno song features a synth lead playing the main melody. This is accompanied by programmed percussion playing a simple kick focused beat. The hi-hat is accented in an open position on the 3-and count of every bar. The synth plays the bass part with a voicing that sounds like a cello. This techno song can be played in a club. The chord sequence is Gm, A7, Eb, Bb, C, F, Gm. The beat counts to 2. The tempo of this song is 128.0 beats per minute. The key of this song is G minor.",
"guidanceScale": 3,
"inferenceSteps": 100
}
Output
Upon successful execution, the action returns a URL linking to the generated music piece in WAV format. Here’s an example output:
https://assets.cognitiveactions.com/invocations/742605c6-9d82-4052-970f-3ecbe9e78aec/1db85cbb-f274-4a23-890f-3ca751b32d62.wav
This URL can be used to access the generated music directly.
Conceptual Usage Example (Python)
To integrate this action into your application, here’s a conceptual Python code snippet demonstrating how to call the Mustango Cognitive Actions API:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "86eddc5f-a66f-4897-a2fe-516e5de459a5" # Action ID for Generate Music from Text Prompt
# Construct the input payload based on the action's requirements
payload = {
"prompt": "This techno song features a synth lead playing the main melody...",
"guidanceScale": 3,
"inferenceSteps": 100
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this example, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The action_id for the "Generate Music from Text Prompt" is provided, and the input payload is structured according to the requirements. This code snippet illustrates how to authenticate and send a request to the hypothetical endpoint.
Conclusion
The Mustango Cognitive Actions provide a powerful way to generate music tailored to your application's needs. By leveraging the ability to craft musical pieces from textual descriptions, developers can enhance user experiences across various domains. Explore this action and consider how it can fit into your projects, whether for interactive applications, content creation, or entertainment solutions. Happy coding!