Effortless Speech Translation from French to English with Hibiki

Hibiki is an innovative service that leverages advanced AI technology to provide seamless speech-to-speech translation. With its focus on real-time processing and voice preservation, Hibiki is designed to transform how we communicate across language barriers. By converting spoken French into English, this tool simplifies interactions in various contexts, from business meetings to travel conversations. The benefits of using Hibiki include enhanced communication efficiency and the ability to maintain the speaker's original tone and emotion, making it an indispensable resource for developers looking to integrate translation capabilities into their applications.
Prerequisites
To get started with Hibiki, you'll need a Cognitive Actions API key and a basic understanding of making API calls. This will ensure that you can access and utilize the powerful features Hibiki offers effectively.
Execute Hibiki Speech-To-Speech Translation
The Execute Hibiki Speech-To-Speech Translation action empowers developers to utilize Hibiki's capabilities for real-time, high-fidelity translation of spoken language from French to English. This action is particularly useful for scenarios where immediate comprehension and response are critical.
Input Requirements
The action requires the following input parameters:
- Audio Input: A URI pointing to the audio file that contains the French speech to be translated.
- Video Input (optional): A URI for a video file if visual context is desired alongside the audio.
- Max Duration: An integer defining the maximum length of the output audio in seconds. Setting this to 0 means no limit.
- Cut Start Seconds: A number indicating how many seconds to trim from the start of the translated audio, with a range between 0 and 4 seconds.
- Volume Reduction Db: An integer specifying the amount of volume reduction (in decibels) to apply to the original audio, ranging from 0 to 60 dB.
Example Input:
{
"audioInput": "https://replicate.delivery/pbxt/MTZNHE3PVfheTWoSVFVA3vvKxUA0is9pgtRFYnfRnRPgQz9K/sample_fr_hibiki_monologue_otis.mp3",
"maxDuration": 0,
"cutStartSeconds": 2,
"volumeReductionDb": 30
}
Expected Output
The expected output is a URI to the translated audio file in English, which retains the speaker's voice and emotional tone.
Example Output:
https://assets.cognitiveactions.com/invocations/34aa7889-1852-46d7-978a-633cf049b14d/b73df371-bd71-4cb2-825b-d85481466733.wav
Use Cases for this Action
This action is particularly beneficial in several scenarios:
- Business Meetings: Facilitating communication between French-speaking clients and English-speaking stakeholders.
- Travel Applications: Enhancing the experience for travelers by providing real-time translations of conversations, allowing for smoother interactions with locals.
- Language Learning Tools: Supporting language learners by providing them with translated audio that maintains the original pronunciation and intonation.
import requests
import json
# Replace with your actual Cognitive Actions API key and endpoint
# Ensure your environment securely handles the API key
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
# This endpoint URL is hypothetical and should be documented for users
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"
action_id = "18e857f6-ba3f-4b08-861c-bd253186de90" # Action ID for: Execute Hibiki Speech-To-Speech Translation
# Construct the exact input payload based on the action's requirements
# This example uses the predefined example_input for this action:
payload = {
"audioInput": "https://replicate.delivery/pbxt/MTZNHE3PVfheTWoSVFVA3vvKxUA0is9pgtRFYnfRnRPgQz9K/sample_fr_hibiki_monologue_otis.mp3",
"maxDuration": 0,
"cutStartSeconds": 2,
"volumeReductionDb": 30
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json",
# Add any other required headers for the Cognitive Actions API
}
# Prepare the request body for the hypothetical execution endpoint
request_body = {
"action_id": action_id,
"inputs": payload
}
print(f"--- Calling Cognitive Action: {action.name or action_id} ---")
print(f"Endpoint: {COGNITIVE_ACTIONS_EXECUTE_URL}")
print(f"Action ID: {action_id}")
print("Payload being sent:")
print(json.dumps(request_body, indent=2))
print("------------------------------------------------")
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json=request_body
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully. Result:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body (non-JSON): {e.response.text}")
print("------------------------------------------------")
Conclusion
Hibiki's Speech-To-Speech Translation action offers a powerful solution for breaking down language barriers through real-time, voice-preserving translations. With its easy integration and versatile use cases, developers can enhance their applications to support multilingual communication effectively. Start exploring Hibiki today to unlock new possibilities in language translation!