Create Human-Like Speech with lucataco/orpheus-3b-0.1-ft Cognitive Actions

In this post, we’ll explore the lucataco/orpheus-3b-0.1-ft specification, which provides powerful Cognitive Actions for generating high-quality, emotive text-to-speech. These actions harness the capabilities of the Orpheus 3B model, allowing you to create voice outputs that sound remarkably human-like, complete with intonation, emotion, and rhythm. With the ability to achieve zero-shot voice cloning and guided emotion control, these pre-built actions can seamlessly integrate into your applications, enhancing user experiences across various domains.
Prerequisites
Before you dive into integrating these Cognitive Actions, ensure you have the following:
- An API key for accessing the Cognitive Actions platform.
- Familiarity with making HTTP requests and handling JSON data in your programming environment.
- Basic understanding of text-to-speech applications and their requirements.
Conceptually, authentication typically involves passing your API key in the headers of your requests, ensuring that your application can securely interact with the Cognitive Actions services.
Cognitive Actions Overview
Generate Emotive Speech
The Generate Emotive Speech action allows you to convert text into speech using the Orpheus 3B model. This action is designed to produce high-quality audio that mimics human speech, including emotional nuances and a natural flow.
- Category: Text-to-Speech
Input
The input for this action is a JSON object that must contain the following fields:
- text (required): The input text to be converted into speech.
- voice (optional): The voice model for speech generation. Available options are
"tara","dan","josh", and"emma". Defaults to"tara". - temperature (optional): Controls the randomness of the speech with a range from 0.1 to 1.5 (default: 0.6).
- maxNewTokens (optional): Maximum number of tokens to generate (range: 100 to 2000, default: 1200).
- topProbability (optional): Top P value for nucleus sampling (range: 0.1 to 1.0, default: 0.95).
- repetitionPenalty (optional): Penalty for repeated tokens to reduce redundancy (range: 1 to 2, default: 1.1).
Example Input:
{
"text": "Hey there my name is Tara, <chuckle> and I'm a speech generation model that can sound like a person.",
"voice": "tara",
"temperature": 0.6,
"maxNewTokens": 1200,
"topProbability": 0.95,
"repetitionPenalty": 1.1
}
Output
Upon successful execution, the action returns a URL pointing to the generated audio file. The output typically looks like this:
Example Output:
https://assets.cognitiveactions.com/invocations/96782923-5ab4-4288-95d3-db5225cc561f/29ee2f76-6661-4b5c-87c4-9680505ea93d.wav
Conceptual Usage Example (Python)
Here’s a conceptual example of how you might invoke the Generate Emotive Speech action using Python. This example illustrates the structure of the input JSON payload and how to make a request to the hypothetical Cognitive Actions endpoint.
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "731f63c0-a57a-4897-b776-5f2cd156286b" # Action ID for Generate Emotive Speech
# Construct the input payload based on the action's requirements
payload = {
"text": "Hey there my name is Tara, <chuckle> and I'm a speech generation model that can sound like a person.",
"voice": "tara",
"temperature": 0.6,
"maxNewTokens": 1200,
"topProbability": 0.95,
"repetitionPenalty": 1.1
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet, replace YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key, and ensure the endpoint URL corresponds to your service. The action ID and input payload are structured according to the specifications provided for the Generate Emotive Speech action.
Conclusion
The lucataco/orpheus-3b-0.1-ft Cognitive Actions offer developers the capability to create highly emotive and human-like speech outputs with minimal effort. By leveraging the power of the Orpheus 3B model, you can enhance user interactions in your applications, whether for virtual assistants, audiobooks, or any other text-to-speech use cases. Start integrating these actions today to bring your applications to life!