Integrate Audio Processing with Cognitive Actions for Enhanced Predictions

23 Apr 2025
Integrate Audio Processing with Cognitive Actions for Enhanced Predictions

Introduction

The Cognitive Actions API under the spec titled datong-new/rvc provides powerful functionalities to developers looking to harness audio processing capabilities in their applications. One of the standout features is the Run Audio Prediction action, which allows users to execute audio prediction operations that support training and inference. By leveraging these pre-built actions, developers can significantly speed up their development process, enabling them to focus on building innovative features rather than getting bogged down in complex audio processing algorithms.

Prerequisites

Before using the Cognitive Actions, ensure you have the following:

  • An API key for the Cognitive Actions platform.
  • Knowledge of how to structure JSON payloads, as this is essential for interacting with the API.
  • An understanding of how to make HTTP requests, as you'll be sending requests to the API endpoint (conceptually, you'll pass the API key in the headers for authentication).

Cognitive Actions Overview

Run Audio Prediction

The Run Audio Prediction action is designed for executing audio prediction operations using audio files for both training and inference. It supports multiple operation modes, allowing developers to train models, make predictions, or do both in a single call.

Input

The input for this action is defined by the CompositeRequest schema, which includes the following required and optional fields:

  • operation (required): Specifies the operation mode. Options include:
    • train: Requires audioForTraining.
    • infer: Requires audioForInference and checkpoint.
    • train_infer: Requires both audioForTraining and audioForInference.
  • audioForTraining (required for train or train_infer): A URI to the audio file used for training.
  • audioForInference (required for infer or train_infer): A URI to the audio file used for inference.
  • checkpoint (optional): A URI pointing to the trained checkpoint file.
  • f0UpKey (optional): An integer representing the F0 up key shift amount. Default is 0.
  • accompaniment (optional): A boolean determining whether the output includes accompaniment. Default is true.

Here’s an example of the input JSON payload:

{
  "f0UpKey": 8,
  "operation": "train_infer",
  "accompaniment": true,
  "audioForTraining": "https://replicate.delivery/pbxt/KSuE9M9iVBPPXjGWfkGpiwD9iZOlHSwAwVmX0vHaA2hJ41Ca/wobunanguo.flac",
  "audioForInference": "https://replicate.delivery/pbxt/KSuE9SkdHEsXPUlZmWj4kMwRNZwJ0CR0EFTzjJAwVqKY8brY/1.wav"
}

Output

Upon successful execution, the action typically returns a JSON object containing:

  • ckpt_path: A URI to the trained checkpoint file.
  • cloned_audio: A URI to the output audio file that has been processed.

For example, the output might look like this:

{
  "ckpt_path": "https://assets.cognitiveactions.com/invocations/1fa5e6b3-ce38-4277-a4b5-5724abd8dcfd/372d1221-a298-403b-9c35-3af3fb490721.pth",
  "cloned_audio": "https://assets.cognitiveactions.com/invocations/1fa5e6b3-ce38-4277-a4b5-5724abd8dcfd/c959918b-04d7-40d3-b1f6-1380e0fcab35.wav"
}

Conceptual Usage Example (Python)

Here’s a conceptual example of how to call the Run Audio Prediction action using Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "8933ebbe-55bc-4139-a60c-eefd58227f52"  # Action ID for Run Audio Prediction

# Construct the input payload based on the action's requirements
payload = {
    "f0UpKey": 8,
    "operation": "train_infer",
    "accompaniment": true,
    "audioForTraining": "https://replicate.delivery/pbxt/KSuE9M9iVBPPXjGWfkGpiwD9iZOlHSwAwVmX0vHaA2hJ41Ca/wobunanguo.flac",
    "audioForInference": "https://replicate.delivery/pbxt/KSuE9SkdHEsXPUlZmWj4kMwRNZwJ0CR0EFTzjJAwVqKY8brY/1.wav"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet, you replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key. The payload is constructed according to the input requirements for the action, ensuring that the correct URIs and parameters are included.

Conclusion

The Run Audio Prediction action within the datong-new/rvc spec offers developers a robust solution for audio processing needs, whether for training models, making predictions, or both. By utilizing this action, you can streamline your workflow and enhance the audio capabilities of your applications. Consider exploring additional use cases such as music cloning or advanced audio processing workflows to fully leverage the power of Cognitive Actions in your projects.