Enhance Your Applications with Speaker Verification Using Titanet-Large Cognitive Actions

In today's digital landscape, verifying speaker identity has become increasingly important in various applications, from security systems to virtual assistants. The Titanet-Large Cognitive Actions offer developers a powerful tool to integrate speaker verification capabilities into their applications. These pre-built actions streamline the process of analyzing audio files, allowing you to focus on building innovative features rather than worrying about the underlying complexities.
Prerequisites
To utilize the Titanet-Large Cognitive Actions, you will need:
- An API key for the Cognitive Actions platform.
- Basic understanding of JSON and RESTful API concepts.
Authentication typically involves passing your API key in the headers of your requests, allowing secure access to the actions.
Cognitive Actions Overview
Verify Speaker Identity
Description: This operation uses the Titanet-Large En model to verify speaker identity by analyzing and comparing the embeddings of two 16000 KHz mono-channel sound files. It returns a similarity score and a verification result based on a cosine similarity threshold, with the option to obtain embeddings of the sound files.
Category: Speaker Identification
Input: The input schema for this action requires the following fields:
- soundFileOne (required): URI for the first audio file (16000 Hz mono-channel).
- soundFileTwo (required): URI for the second audio file (16000 Hz mono-channel).
- threshold (optional): Cosine similarity threshold for verification (default is 0.7, must be between 0.1 and 0.95).
- returnEmbedding (optional): Boolean indicating whether to return the embedding(s) (default is false).
Example Input:
{
"threshold": 0.7,
"soundFileOne": "https://replicate.delivery/pbxt/JuQ5yJc5SdemzMOeLVtUDIah9ZENfcYkzbO60XdyBGpnEbVX/carrell1.mp3",
"soundFileTwo": "https://replicate.delivery/pbxt/JuQ5ymR76W4KtopO9eTLn6aJaWhJT1IPP0WsPAQCiU9JgYXb/carrell2.mp3",
"returnEmbedding": true
}
Output: The action typically returns a JSON response containing:
- verified: Boolean indicating whether the speakers were verified.
- similarity: A numerical value representing the cosine similarity score.
- embedding1: Array of floating-point numbers representing the embedding of the first audio file (if requested).
- embedding2: Array of floating-point numbers representing the embedding of the second audio file (if requested).
Example Output:
{
"verified": true,
"similarity": 0.9157281517982483,
"embedding1": [...], // Array of floats
"embedding2": [...] // Array of floats
}
Conceptual Usage Example (Python): Here's how you might call the Verify Speaker Identity action using Python. This snippet demonstrates constructing the input JSON payload and making a POST request to a hypothetical Cognitive Actions execution endpoint.
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "cc073080-78e8-46a1-a3d2-efcf9dfb4c03" # Action ID for Verify Speaker Identity
# Construct the input payload based on the action's requirements
payload = {
"threshold": 0.7,
"soundFileOne": "https://replicate.delivery/pbxt/JuQ5yJc5SdemzMOeLVtUDIah9ZENfcYkzbO60XdyBGpnEbVX/carrell1.mp3",
"soundFileTwo": "https://replicate.delivery/pbxt/JuQ5ymR76W4KtopO9eTLn6aJaWhJT1IPP0WsPAQCiU9JgYXb/carrell2.mp3",
"returnEmbedding": True
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet, replace the placeholder API key and endpoint with your actual credentials. The payload variable is constructed following the input schema defined for the "Verify Speaker Identity" action.
Conclusion
The Titanet-Large Cognitive Actions provide a robust and efficient way to integrate speaker verification into your applications. By leveraging the power of audio analysis, you can enhance security and user experience across various use cases. Start integrating these actions today and explore the possibilities they unlock for your applications!