Transform Your Audio Experience: Integrating Voice Cloning with ahm3texe/test999 Actions

21 Apr 2025
Transform Your Audio Experience: Integrating Voice Cloning with ahm3texe/test999 Actions

In today's fast-paced digital landscape, the ability to transform audio content is crucial for developers looking to enhance user experiences. The ahm3texe/test999 Cognitive Actions provide powerful voice transformation capabilities, allowing you to customize audio by adjusting voice models, pitch, and accents. These pre-built actions simplify the process of incorporating advanced audio processing into your applications, enabling you to focus on creating engaging content.

Prerequisites

Before diving into the integration, there are a few general requirements you should be aware of:

  • An API key for the Cognitive Actions platform is necessary to authenticate your requests.
  • Ensure that you have access to a proper URI for the audio files you wish to process.

Authentication will typically involve passing your API key in the request headers as shown in the conceptual usage example.

Cognitive Actions Overview

Transform Audio Voice

Description: This action allows you to transform input audio by adjusting voice model, pitch, and accent. You can customize the transformation using predefined models or a custom RVC model.

Category: voice-cloning

Input

The action requires a structured input object. Below is the schema along with an example input.

{
  "protect": 0.33,
  "audioInput": "https://replicate.delivery/pbxt/LAxQbQLiKJZevqiV9Raodpdd6W0ihu3Wnb1K6xCpE6rcUIu5/ttsMP3.com_VoiceText_2024-6-29_0-22-2.mp3",
  "voiceModel": "CUSTOM",
  "pitchAdjustment": 0,
  "accentControlRate": 0.5,
  "audioOutputFormat": "mp3",
  "medianFilterRadius": 3,
  "pitchCheckInterval": 128,
  "pitchDetectionMethod": "rmvpe",
  "originalLoudnessMixRate": 0.25,
  "customVoiceModelDownloadUrl": "https://huggingface.co/Argax/doofenshmirtz-RUS/resolve/main/doofenshmirtz.zip"
}
  • protect (number): Determines the amount of original breath and voiceless consonants to retain in AI vocals (default: 0.33, range: 0 to 0.5).
  • audioInput (string): A URI pointing to the audio file to process.
  • voiceModel (string): Select from predefined models or use 'CUSTOM' for a custom model.
  • pitchAdjustment (number): Adjusts pitch in semitones (default: 0).
  • accentControlRate (number): Defines the extent to which the accent is retained (default: 0.5, range: 0 to 1).
  • audioOutputFormat (string): Specifies the output audio format (default: 'mp3').
  • medianFilterRadius (integer): Applies median filtering if the value is 3 or more (default: 3, range: 0 to 7).
  • pitchCheckInterval (integer): Specifies the interval in milliseconds for pitch checks (default: 128).
  • pitchDetectionMethod (string): Algorithm for pitch detection (default: 'rmvpe').
  • originalLoudnessMixRate (number): Mixes original loudness with a constant level (default: 0.25).
  • customVoiceModelDownloadUrl (string): URL to download a custom RVC model.

Output

The action typically returns a URI pointing to the transformed audio file. Here is an example output:

https://assets.cognitiveactions.com/invocations/ccfe5da6-3d36-4451-bc64-590c7cc6edfd/6ffb734e-284e-4d39-952c-fdcc92ed965d.wav

Conceptual Usage Example (Python)

Here’s how you might call the Transform Audio Voice action using a conceptual Python implementation:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "b66125ba-2cf9-4002-90f2-4c9538848b7e" # Action ID for Transform Audio Voice

# Construct the input payload based on the action's requirements
payload = {
    "protect": 0.33,
    "audioInput": "https://replicate.delivery/pbxt/LAxQbQLiKJZevqiV9Raodpdd6W0ihu3Wnb1K6xCpE6rcUIu5/ttsMP3.com_VoiceText_2024-6-29_0-22-2.mp3",
    "voiceModel": "CUSTOM",
    "pitchAdjustment": 0,
    "accentControlRate": 0.5,
    "audioOutputFormat": "mp3",
    "medianFilterRadius": 3,
    "pitchCheckInterval": 128,
    "pitchDetectionMethod": "rmvpe",
    "originalLoudnessMixRate": 0.25,
    "customVoiceModelDownloadUrl": "https://huggingface.co/Argax/doofenshmirtz-RUS/resolve/main/doofenshmirtz.zip"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet:

  • Replace "YOUR_COGNITIVE_ACTIONS_API_KEY" with your actual API key.
  • The action ID for Transform Audio Voice is used to specify the action being called.
  • The input payload is structured according to the requirements outlined earlier.

Conclusion

Integrating the ahm3texe/test999 Cognitive Actions into your applications offers exciting possibilities for audio transformation. By utilizing the Transform Audio Voice action, you can create customized listening experiences tailored to your audience's preferences. As you explore these capabilities, consider how voice transformation can enhance your projects and engage your users in new and innovative ways. Happy coding!