Generate Text Embeddings with BAAI's BGE v1.5 Model

24 Apr 2025
Generate Text Embeddings with BAAI's BGE v1.5 Model

In today's digital landscape, the ability to understand and process natural language is paramount. The Center for Curriculum Redesign has introduced the BGE 1.5 Query Embeddings API, which utilizes BAAI's state-of-the-art bge-large-en v1.5 model. This model excels in generating embeddings for queries, facilitating passage retrieval, and enhancing sentence similarity. With 326 million parameters and a dimensionality of 1024, it normalizes embeddings for cosine similarity, making it a powerful tool for developers looking to integrate advanced text processing capabilities into their applications.

Prerequisites

Before you begin using the Cognitive Actions, ensure you have the following:

  • An API key for accessing the Cognitive Actions platform.
  • Familiarity with making HTTP requests and handling JSON data.

Authentication typically involves passing the API key as a Bearer token in the headers of your requests.

Cognitive Actions Overview

Generate Query Embeddings

The Generate Query Embeddings action allows developers to create embeddings for text queries, optimizing them for various downstream applications like document comparison and retrieval systems.

Purpose

This action leverages the BAAI bge-large-en v1.5 model to generate high-quality embeddings from input text queries.

Input

The input schema for this action requires the following fields:

  • normalize (boolean, optional): Normalizes returned embedding vectors to a magnitude of 1. Defaults to true.
  • precision (string, optional): Specifies the numerical precision for inference computations. Options are full or half. Defaults to full.
  • queryTexts (string, required): A serialized JSON array of strings for which to generate retrieval embeddings. Each string returns a corresponding vector.
  • maxBatchTokens (number, optional): Sets the maximum number of kibiTokens (1 kibiToken = 1024 tokens) for a batch. Default is 200.
Example Input
{
  "normalize": true,
  "precision": "full",
  "queryTexts": "[\"hello world\", \"You can get query embeddings for multiple strings at a time\", \"It's better to keep your query strings shorter than your passage strings\", \"this endpoint will automatically prepend BAAI's retrieval prefix to your strings\", \"If you want more control over this behavior, you might be interested in the general embedding endpoint\"]",
  "maxBatchTokens": 200
}

Output

The output of this action typically includes:

  • query_embeddings: An array of arrays, where each inner array represents the embedding vector for the corresponding input query.
  • extra_metrics: Additional performance metrics such as device used, computation time, and inference time.
Example Output
{
  "extra_metrics": {
    "dtype": "torch.float32",
    "device": "cuda:0",
    "compute_milliseconds": 71,
    "inference_milliseconds": 68
  },
  "query_embeddings": [
    [0.04135016351938248, -0.00942438468337059, ...],
    [-0.027015989646315575, 0.011873127892613411, ...],
    ...
  ]
}

Conceptual Usage Example (Python)

Here's a conceptual example of how a developer might implement the Generate Query Embeddings action using Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"  # Hypothetical endpoint

action_id = "c3779bdb-b5a6-44d6-bd0f-2d491ce06836"  # Action ID for Generate Query Embeddings

# Construct the input payload based on the action's requirements
payload = {
    "normalize": True,
    "precision": "full",
    "queryTexts": "[\"hello world\", \"You can get query embeddings for multiple strings at a time\", \"It's better to keep your query strings shorter than your passage strings\", \"this endpoint will automatically prepend BAAI's retrieval prefix to your strings\", \"If you want more control over this behavior, you might be interested in the general embedding endpoint\"]",
    "maxBatchTokens": 200
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload}  # Hypothetical structure
    )
    response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet:

  • Replace the placeholder YOUR_COGNITIVE_ACTIONS_API_KEY with your actual API key.
  • The payload is constructed based on the input schema for the action.
  • The request is sent to a hypothetical endpoint, and the response is processed to display the results.

Conclusion

Integrating the Generate Query Embeddings action into your application can significantly enhance text processing capabilities, enabling improved document retrieval and query similarity assessments. With the ease of use provided by the Cognitive Actions API, developers can quickly leverage this powerful tool to create more intelligent applications. Consider exploring additional use cases, such as combining embeddings with machine learning models for advanced insights and predictions.