Transforming Text to Audio: Integrating the Sepal Audiogen Cognitive Actions

24 Apr 2025
Transforming Text to Audio: Integrating the Sepal Audiogen Cognitive Actions

In today's digital landscape, integrating audio capabilities into applications can significantly enhance user experience. The Sepal Audiogen Cognitive Actions provide developers with powerful tools to convert descriptive text prompts into audio files using the AudioGen model, which is built upon Meta's Audiocraft library. This article will explore the capabilities of the Generate Sound From Text action, guiding you through its usage and advantages.

Prerequisites

To get started with the Sepal Audiogen Cognitive Actions, you'll need an API key for the Cognitive Actions platform. This key will be used for authentication when making requests. Generally, you would pass the API key in the headers of your requests to authenticate your application's access to the service.

Cognitive Actions Overview

Generate Sound From Text

The Generate Sound From Text action allows developers to create audio files from descriptive text prompts. This action is categorized under audio-processing and offers customizable parameters to tailor the audio output.

Input

The input schema for this action requires a JSON object containing the following fields:

  • prompt (required): A descriptive text prompt for generating the desired sound.
  • topK (optional): Limits sampling to the top k most likely tokens. Default is 250.
  • topP (optional): Limits sampling to tokens within the cumulative probability of p. A value of 0 (default) uses topK sampling.
  • duration (optional): Specifies the maximum duration of the generated sound in seconds (1 to 10 seconds). Default is 3 seconds.
  • temperature (optional): Adjusts the randomness of the sampling process. Higher values lead to more diverse outputs. Default is 1.
  • outputFormat (optional): The format for the generated audio file. Options are 'wav' or 'mp3'. Default is 'wav'.
  • classifierFreeGuidance (optional): Controls the adherence of the output to the input prompt. Higher values produce outputs that are more closely aligned to the input. Default is 3.

Example Input:

{
  "topK": 250,
  "topP": 0,
  "prompt": "Formula f1 cars driving by",
  "duration": 5,
  "temperature": 1,
  "outputFormat": "mp3",
  "classifierFreeGuidance": 3
}

Output

Upon successful execution, the action typically returns a URL pointing to the generated audio file.

Example Output:

https://assets.cognitiveactions.com/invocations/5787a982-b70a-4686-a28c-350369c18ca2/5727ad60-b762-45aa-b675-49de6dc7d2c7.mp3

This URL can be used to access and play the generated audio.

Conceptual Usage Example (Python)

Below is a conceptual Python code snippet illustrating how to invoke the Generate Sound From Text action:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "272e1f22-16b1-4420-8a03-54bad606dec0" # Action ID for Generate Sound From Text

# Construct the input payload based on the action's requirements
payload = {
    "topK": 250,
    "topP": 0,
    "prompt": "Formula f1 cars driving by",
    "duration": 5,
    "temperature": 1,
    "outputFormat": "mp3",
    "classifierFreeGuidance": 3
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet, the action ID for Generate Sound From Text is specified, and the required input payload is structured according to the action's schema. The API key is used in the headers for authentication, and the response is handled to retrieve the generated audio URL.

Conclusion

The Sepal Audiogen Cognitive Actions empower developers to easily integrate audio generation capabilities into their applications. By leveraging the Generate Sound From Text action, you can create engaging audio experiences based on textual descriptions. Whether you're building games, educational tools, or multimedia applications, these actions open up exciting possibilities for audio content generation. Explore these capabilities and consider how they can enhance your next project!