Generate Text Embeddings with BAAI's BGE v1.5 Model

In today's digital landscape, the ability to understand and process natural language is paramount. The Center for Curriculum Redesign has introduced the BGE 1.5 Query Embeddings API, which utilizes BAAI's state-of-the-art bge-large-en v1.5 model. This model excels in generating embeddings for queries, facilitating passage retrieval, and enhancing sentence similarity. With 326 million parameters and a dimensionality of 1024, it normalizes embeddings for cosine similarity, making it a powerful tool for developers looking to integrate advanced text processing capabilities into their applications.
Prerequisites
Before you begin using the Cognitive Actions, ensure you have the following:
- An API key for accessing the Cognitive Actions platform.
- Familiarity with making HTTP requests and handling JSON data.
Authentication typically involves passing the API key as a Bearer token in the headers of your requests.
Cognitive Actions Overview
Generate Query Embeddings
The Generate Query Embeddings action allows developers to create embeddings for text queries, optimizing them for various downstream applications like document comparison and retrieval systems.
Purpose
This action leverages the BAAI bge-large-en v1.5 model to generate high-quality embeddings from input text queries.
Input
The input schema for this action requires the following fields:
- normalize (boolean, optional): Normalizes returned embedding vectors to a magnitude of 1. Defaults to
true. - precision (string, optional): Specifies the numerical precision for inference computations. Options are
fullorhalf. Defaults tofull. - queryTexts (string, required): A serialized JSON array of strings for which to generate retrieval embeddings. Each string returns a corresponding vector.
- maxBatchTokens (number, optional): Sets the maximum number of kibiTokens (1 kibiToken = 1024 tokens) for a batch. Default is 200.
Example Input
{
"normalize": true,
"precision": "full",
"queryTexts": "[\"hello world\", \"You can get query embeddings for multiple strings at a time\", \"It's better to keep your query strings shorter than your passage strings\", \"this endpoint will automatically prepend BAAI's retrieval prefix to your strings\", \"If you want more control over this behavior, you might be interested in the general embedding endpoint\"]",
"maxBatchTokens": 200
}
Output
The output of this action typically includes:
- query_embeddings: An array of arrays, where each inner array represents the embedding vector for the corresponding input query.
- extra_metrics: Additional performance metrics such as device used, computation time, and inference time.
Example Output
{
"extra_metrics": {
"dtype": "torch.float32",
"device": "cuda:0",
"compute_milliseconds": 71,
"inference_milliseconds": 68
},
"query_embeddings": [
[0.04135016351938248, -0.00942438468337059, ...],
[-0.027015989646315575, 0.011873127892613411, ...],
...
]
}
Conceptual Usage Example (Python)
Here's a conceptual example of how a developer might implement the Generate Query Embeddings action using Python:
import requests
import json
# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint
action_id = "c3779bdb-b5a6-44d6-bd0f-2d491ce06836" # Action ID for Generate Query Embeddings
# Construct the input payload based on the action's requirements
payload = {
"normalize": True,
"precision": "full",
"queryTexts": "[\"hello world\", \"You can get query embeddings for multiple strings at a time\", \"It's better to keep your query strings shorter than your passage strings\", \"this endpoint will automatically prepend BAAI's retrieval prefix to your strings\", \"If you want more control over this behavior, you might be interested in the general embedding endpoint\"]",
"maxBatchTokens": 200
}
headers = {
"Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
"Content-Type": "application/json"
}
try:
response = requests.post(
COGNITIVE_ACTIONS_EXECUTE_URL,
headers=headers,
json={"action_id": action_id, "inputs": payload} # Hypothetical structure
)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
result = response.json()
print("Action executed successfully:")
print(json.dumps(result, indent=2))
except requests.exceptions.RequestException as e:
print(f"Error executing action {action_id}: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
try:
print(f"Response body: {e.response.json()}")
except json.JSONDecodeError:
print(f"Response body: {e.response.text}")
In this code snippet:
- Replace the placeholder
YOUR_COGNITIVE_ACTIONS_API_KEYwith your actual API key. - The
payloadis constructed based on the input schema for the action. - The request is sent to a hypothetical endpoint, and the response is processed to display the results.
Conclusion
Integrating the Generate Query Embeddings action into your application can significantly enhance text processing capabilities, enabling improved document retrieval and query similarity assessments. With the ease of use provided by the Cognitive Actions API, developers can quickly leverage this powerful tool to create more intelligent applications. Consider exploring additional use cases, such as combining embeddings with machine learning models for advanced insights and predictions.