Transform Voices Effortlessly with Pseudoram RVC v2 Cognitive Actions

23 Apr 2025
Transform Voices Effortlessly with Pseudoram RVC v2 Cognitive Actions

In the realm of voice manipulation and cloning, the pseudoram/rvc-v2 API offers powerful Cognitive Actions that enable developers to transform voices in innovative and customizable ways. By leveraging pre-built actions, developers can save time while integrating sophisticated voice conversion capabilities into their applications. In this post, we'll explore the "Transform Voice Using RVC v2" action, detailing its parameters, expected outputs, and how to implement it in your projects.

Prerequisites

Before diving into the integration, ensure you have the following prerequisites:

  • An API key for accessing the Cognitive Actions platform.
  • Familiarity with JSON formats, as inputs and outputs are structured this way.
  • A basic understanding of making HTTP requests in your programming environment.

Authentication generally involves passing your API key in the request headers to authorize your access to the Cognitive Actions.

Cognitive Actions Overview

Transform Voice Using RVC v2

The Transform Voice Using RVC v2 action allows you to convert a voice to another using RVC v2 models. This action is particularly useful for applications that require custom voice support and fine-tuned voice conversion settings.

Input

The input for this action is a structured JSON object, which includes several customizable parameters:

  • protect (number, default: 0.33): Controls the retention of original vocals' breath and voiceless consonants. Valid range is 0 to 0.5.
  • audioInput (string): URI for the input audio file. Must be a valid URL pointing to an audio file.
  • voiceModel (string): Specifies the RVC model (e.g., "Obama", "Trump", "CUSTOM"). Defaults to "Obama".
  • accentStrength (number, default: 0.5): Sets the intensity of the AI's accent, ranging from 0 to 1.
  • pitchAdjustment (number, default: 0): Adjusts the pitch of AI-generated vocals in semitones.
  • audioOutputFormat (string, default: "mp3"): Selects the format for the output audio file ("mp3" or "wav").
  • medianFilterRadius (integer, default: 3): Applies median filtering if the value is 3 or greater.
  • pitchCheckInterval (integer, default: 128): Defines the interval in milliseconds for checking pitch changes.
  • originalLoudnessMix (number, default: 0.25): Balances between the original loudness (0) and a fixed loudness (1).
  • pitchDetectionMethod (string, default: "rmvpe"): Specifies the pitch detection algorithm (options: "rmvpe", "mangio-crepe").
  • customModelDownloadUrl (string): URL to download a custom RVC model if desired.

Example Input:

{
  "protect": 0.5,
  "audioInput": "https://replicate.delivery/pbxt/LAxQbQLiKJZevqiV9Raodpdd6W0ihu3Wnb1K6xCpE6rcUIu5/ttsMP3.com_VoiceText_2024-6-29_0-22-2.mp3",
  "voiceModel": "CUSTOM",
  "accentStrength": 1,
  "pitchAdjustment": 8,
  "audioOutputFormat": "mp3",
  "medianFilterRadius": 1,
  "pitchCheckInterval": 128,
  "originalLoudnessMix": 1,
  "pitchDetectionMethod": "rmvpe",
  "customModelDownloadUrl": "https://huggingface.co/Argax/doofenshmirtz-RUS/resolve/main/doofenshmirtz.zip"
}

Output

The output of this action is typically a URL pointing to the generated audio file. This file contains the transformed voice based on the specified parameters.

Example Output:

https://assets.cognitiveactions.com/invocations/61aa0768-4fcf-4eab-bd71-3d1071504f62/81e196c0-001f-49a4-ac66-30fa28b854cb.wav

Conceptual Usage Example (Python)

Here's a conceptual example of how you might call the Cognitive Actions API using Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "63a919cd-e5b2-4d03-888d-f73fcacb8d60"  # Action ID for Transform Voice Using RVC v2

# Construct the input payload based on the action's requirements
payload = {
    "protect": 0.5,
    "audioInput": "https://replicate.delivery/pbxt/LAxQbQLiKJZevqiV9Raodpdd6W0ihu3Wnb1K6xCpE6rcUIu5/ttsMP3.com_VoiceText_2024-6-29_0-22-2.mp3",
    "voiceModel": "CUSTOM",
    "accentStrength": 1,
    "pitchAdjustment": 8,
    "audioOutputFormat": "mp3",
    "medianFilterRadius": 1,
    "pitchCheckInterval": 128,
    "originalLoudnessMix": 1,
    "pitchDetectionMethod": "rmvpe",
    "customModelDownloadUrl": "https://huggingface.co/Argax/doofenshmirtz-RUS/resolve/main/doofenshmirtz.zip"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet:

  • Replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key.
  • The action_id corresponds to the "Transform Voice Using RVC v2" action.
  • The payload is structured according to the input schema with various parameters, showcasing how to customize the voice transformation.

Conclusion

The pseudoram/rvc-v2 Cognitive Actions provide developers with powerful tools to manipulate and transform voices effectively. By utilizing the "Transform Voice Using RVC v2" action, you can easily integrate advanced voice conversion features into your applications. Consider exploring additional use cases, such as voiceovers for videos, personalized voice assistants, or creative audio applications. Happy coding!