Transform Your Music with the lucataco Singing Voice Conversion Actions

22 Apr 2025
Transform Your Music with the lucataco Singing Voice Conversion Actions

In the world of audio processing, the ability to transform voices and mimic styles can open up a plethora of creative possibilities. The lucataco/singing_voice_conversion API provides powerful Cognitive Actions that allow developers to convert singing voices in source audio files to imitate the style of various target singers. Utilizing advanced models like DiffWaveNetSVC, these actions not only ensure high-quality voice conversion but also offer customizable options for key and pitch shifting.

Prerequisites

Before you dive into using the Cognitive Actions for singing voice conversion, ensure you have the following:

  • An API key for the Cognitive Actions platform. This key will be used for authentication when making requests.
  • Familiarity with JSON structures, as the input and output will be in this format.

Authentication typically involves passing your API key in the headers of your requests.

Cognitive Actions Overview

Convert Singing Voice

The Convert Singing Voice action is designed to transform the singing voice in a source audio file to mimic the style of a target singer. This action supports key and pitch shifts, offering flexibility in how the final audio is rendered.

  • Category: Voice Cloning

Input

The input for this action is defined by the following schema:

{
  "sourceAudio": "https://replicate.delivery/pbxt/K5coMzCs7mnhljhRVhdhN29I3RlHPkneVxrbPtyArzxvAVtI/adele.wav",
  "keyShiftMode": 0,
  "targetSinger": "Taylor Swift",
  "pitchShiftControl": "Auto Shift",
  "diffusionInferenceSteps": 1000
}
  • Required Fields:
    • sourceAudio: The URI of the input source audio file (must be a valid link to an audio file).
  • Optional Fields:
    • keyShiftMode: An integer value that represents the shift in musical key, ranging from -6 to 6 (default is 0).
    • targetSinger: The artist whose style will be used for conversion (default is "Taylor Swift").
    • pitchShiftControl: Method to control pitch shifting ("Auto Shift" or "Key Shift", default is "Auto Shift").
    • diffusionInferenceSteps: The number of inference steps in the diffusion process, ranging from 0 to 1000 (default is 1000).

Example Input

Here is an example of a JSON payload you would send to the action:

{
  "sourceAudio": "https://replicate.delivery/pbxt/K5coMzCs7mnhljhRVhdhN29I3RlHPkneVxrbPtyArzxvAVtI/adele.wav",
  "keyShiftMode": 0,
  "targetSinger": "Taylor Swift",
  "pitchShiftControl": "Auto Shift",
  "diffusionInferenceSteps": 1000
}

Output

The action returns a URI to the converted audio file. An example output might look like this:

https://assets.cognitiveactions.com/invocations/7603ad76-72f1-4495-89f9-fdea2129ad3b/2a682905-19f7-45fa-9e54-376aa6d4a412.wav

This output can be used to download or stream the newly transformed audio.

Conceptual Usage Example (Python)

Below is a conceptual Python code snippet demonstrating how to call the Convert Singing Voice action:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "d4cef7b1-eade-46fd-a105-3fbbda2f25e6" # Action ID for Convert Singing Voice

# Construct the input payload based on the action's requirements
payload = {
    "sourceAudio": "https://replicate.delivery/pbxt/K5coMzCs7mnhljhRVhdhN29I3RlHPkneVxrbPtyArzxvAVtI/adele.wav",
    "keyShiftMode": 0,
    "targetSinger": "Taylor Swift",
    "pitchShiftControl": "Auto Shift",
    "diffusionInferenceSteps": 1000
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this snippet:

  • Replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key.
  • The action ID and input payload are structured as specified, ensuring compatibility with the API.

Conclusion

The lucataco/singing_voice_conversion actions empower developers to explore innovative voice transformation capabilities, whether for music production, entertainment, or creative applications. With customizable parameters and high-quality outputs, these actions can significantly enhance your audio processing projects. Dive in and start experimenting with converting voices to match your favorite artists!