Transform Text into Realistic Audio with Suno-AI Bark Actions

22 Apr 2025
Transform Text into Realistic Audio with Suno-AI Bark Actions

In today's digital landscape, the ability to generate rich audio content from text is more critical than ever. The Suno-AI Bark Cognitive Actions offer developers the chance to harness a powerful text-to-audio model capable of producing realistic multilingual speech and various audio elements from simple text prompts. This guide covers the key features of the Generate Audio From Text Prompt action, which allows for comprehensive audio synthesis, including nonverbal communications like laughter and sighs.

Prerequisites

Before you start integrating the Suno-AI Bark Cognitive Actions, ensure you have the following:

  • An API key for the Cognitive Actions platform.
  • Basic familiarity with making HTTP requests and handling JSON data.

Authentication typically involves passing your API key in the request headers, which we'll detail in our conceptual usage examples.

Cognitive Actions Overview

Generate Audio From Text Prompt

The Generate Audio From Text Prompt action utilizes the Bark model to create realistic audio based on textual input. This includes not only spoken words but also various sounds and effects that enrich the audio experience, making it suitable for applications in gaming, virtual environments, and more.

  • Category: Text-to-Speech

Input

The input schema for this action requires the following fields:

  • prompt (string, required): The text to guide the audio generation. For example:
    {
      "prompt": "Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe."
    }
    
  • outputFull (boolean, optional): If set to true, returns the full generation output as a .npz file.
    • Default: false
  • historyPrompt (string, optional): Select a preset history for audio cloning from a variety of speaker options.
  • customHistoryPrompt (string, optional): A URI pointing to a custom .npz file for audio cloning, overriding the history prompt selection.
  • textTemperature (number, optional): Controls the randomness of text generation. Defaults to 0.7.
  • waveformTemperature (number, optional): Controls the randomness of waveform generation. Defaults to 0.7.

Here’s an example input payload:

{
  "prompt": "Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe."
}

Output

Upon successful execution, the action returns an audio file link and a prompt history if applicable. Here’s an example of the expected output:

{
  "audio_out": "https://assets.cognitiveactions.com/invocations/7c710276-2f3c-442d-93a6-8cb30eff7fa7/23c6a35a-4d63-4922-9425-295480be008d.wav",
  "prompt_npz": null
}

Conceptual Usage Example (Python)

Here’s a conceptual Python code snippet to illustrate how you might call this action:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "c4975add-012a-4243-82db-21624a67d536"  # Action ID for Generate Audio From Text Prompt

# Construct the input payload based on the action's requirements
payload = {
    "prompt": "Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe."
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code:

  • Replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key.
  • The action_id corresponds to the Generate Audio From Text Prompt action.
  • The payload is structured according to the input requirements.

Conclusion

The Suno-AI Bark Cognitive Actions provide developers with the powerful capability to generate diverse audio outputs from text. With features like multilingual support and the inclusion of nonverbal sounds, these actions can enhance user experiences in various applications. As you explore these capabilities, consider potential use cases such as interactive storytelling, gaming, or virtual assistants. Happy coding!