Enhance Your Applications with Audio and Video Predictions Using GPT-Sovits Actions

22 Apr 2025
Enhance Your Applications with Audio and Video Predictions Using GPT-Sovits Actions

Integrating advanced media processing capabilities into your applications can significantly enhance user experiences. The douwantech/gpt-sovits-train provides Cognitive Actions that empower developers to perform predictions using audio and video files. With these pre-built actions, you can harness the power of the GPT-Sovits model to analyze media content effortlessly. This article will guide you through one of the key actions provided, detailing its purpose, input requirements, expected output, and offering a conceptual usage example in Python.

Prerequisites

Before you start using the Cognitive Actions, ensure you have the following:

  • An API key for the Cognitive Actions platform to authenticate your requests.
  • Basic knowledge of making HTTP requests in your programming language of choice.
  • Familiarity with JSON format for structuring your input data.

For authentication, you will typically need to pass your API key in the headers of your requests.

Cognitive Actions Overview

Predict Using Audio/Video URL

The Predict Using Audio/Video URL action enables you to perform predictions based on a URL that points to an audio or video file. This action utilizes the GPT-Sovits model from douwantech and accommodates configurations for direct upload to Aliyun OSS.

Input

The input for this action requires the following JSON schema:

{
  "audioOrVideoUrl": "https://general-api.oss-cn-hangzhou.aliyuncs.com/static/2.mp4",
  "aliyunOssConfiguration": {
    "access_key_id": "your_access_key",
    "access_key_secret": "your_secret_key",
    "bucket_name": "your_bucket",
    "endpoint": "your_endpoint",
    "domain": "your_domain"
  }
}
  • Required Field:
    • audioOrVideoUrl: This is the URL of the audio or video file that you want to use for the prediction.
  • Optional Field:
    • aliyunOssConfiguration: This includes configuration settings for direct upload to Aliyun OSS. It must be provided in JSON format if needed.

Example Input:

{
  "audioOrVideoUrl": "https://general-api.oss-cn-hangzhou.aliyuncs.com/static/2.mp4"
}

Output

Upon successful execution, the action returns a JSON response structured as follows:

{
  "zip_url": "https://assets.cognitiveactions.com/invocations/971ef8f9-3bce-47a0-b3f8-2e757a6874e6/fa04550a-afe3-4543-9fcb-99ad12d74737.zip",
  "audio_url": "https://assets.cognitiveactions.com/invocations/971ef8f9-3bce-47a0-b3f8-2e757a6874e6/db9b728c-ecb7-44dd-8720-bfa39d032d17.mp3",
  "oss_zip_url": null
}
  • Output Fields:
    • zip_url: A URL to download the ZIP file containing the generated predictions.
    • audio_url: A URL to access the processed audio file.
    • oss_zip_url: This may be null if no OSS upload was performed.

Conceptual Usage Example (Python)

Here’s a conceptual Python code snippet demonstrating how to call the Predict Using Audio/Video URL action:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "41ac67ac-c7de-412b-ad18-934639bb1320"  # Action ID for Predict Using Audio/Video URL

# Construct the input payload based on the action's requirements
payload = {
    "audioOrVideoUrl": "https://general-api.oss-cn-hangzhou.aliyuncs.com/static/2.mp4"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code:

  • Ensure to replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key.
  • The action ID for the Predict Using Audio/Video URL action is included in the payload.
  • The input payload is structured according to the action's requirements.

Conclusion

The douwantech/gpt-sovits-train Cognitive Actions provide powerful capabilities for processing audio and video content. By leveraging the Predict Using Audio/Video URL action, you can seamlessly integrate media predictions into your applications, enhancing the overall functionality and user experience.

As a next step, consider experimenting with different audio and video files to explore the capabilities further, or integrate this action into a larger application workflow for media analysis. Happy coding!