Generate Multilingual Text Embeddings with Ease

26 Apr 2025
Generate Multilingual Text Embeddings with Ease

The Multilingual E5 Base is a powerful tool designed to help developers seamlessly incorporate multilingual capabilities into their applications. Its Cognitive Actions enable efficient processing of text data across various languages, facilitating tasks such as content analysis, translation, and comparison. By generating language-independent text embeddings, developers can enhance their applications with improved understanding and processing of text data, regardless of the language used.

This service is particularly beneficial for applications that require speed and simplification in handling multilingual text. Common use cases include chatbots, multilingual content management systems, and any application that requires semantic understanding of text inputs. With the Multilingual E5 Base, developers can ensure that their applications maintain a high level of performance and accuracy in diverse linguistic contexts.

To get started, you will need a Cognitive Actions API key and a fundamental understanding of making API calls.

Generate Multilingual Text Embeddings

This action utilizes the multilingual E5 base model to produce language-independent text embeddings from the provided text segments. By generating embeddings that are normalized, the action improves the consistency and comparability of results, allowing for more reliable analysis and processing of text data.

Input Requirements:

  • texts: A JSON-formatted list of strings representing the text segments to embed (e.g., ["In the water, fish are swimming.", "Fish swim in the water.", "A book lies open on the table."]).
  • batchSize: An integer indicating the number of text items to process simultaneously. It must be a non-negative integer (default is 32).
  • normalizeEmbeddings: A boolean indicating whether the embeddings should be normalized (default is true).

Example Input:

{
  "texts": ["In the water, fish are swimming.", "Fish swim in the water.", "A book lies open on the table."],
  "batchSize": 32,
  "normalizeEmbeddings": true
}

Expected Output: The output will be a list of arrays, each containing the generated embeddings for the input text segments. Each embedding is a numeric vector that represents the semantic meaning of the corresponding text segment.

Example Output:

[
  [0.008765427395701408, 0.04697509855031967, ...],
  [0.011446335352957249, 0.0471852570772171, ...],
  ...
]

Use Cases for this Specific Action:

  • Chatbots and Virtual Assistants: Enhance the understanding of user queries in multiple languages, allowing for more accurate responses.
  • Content Management Systems: Facilitate the analysis and organization of multilingual content by providing consistent text embeddings for search and categorization.
  • Semantic Search Engines: Improve search functionality across languages by comparing the semantic meaning of queries and documents rather than relying solely on keyword matches.
import requests
import json

# Replace with your actual Cognitive Actions API key and endpoint
# Ensure your environment securely handles the API key
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
# This endpoint URL is hypothetical and should be documented for users
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"

action_id = "ccaedbcd-5aee-410e-b43c-837cd76a1e51" # Action ID for: Generate Multilingual Text Embeddings

# Construct the exact input payload based on the action's requirements
# This example uses the predefined example_input for this action:
payload = {
  "texts": "[\"In the water, fish are swimming.\", \"Fish swim in the water.\", \"A book lies open on the table.\"]",
  "batchSize": 32,
  "normalizeEmbeddings": true
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json",
    # Add any other required headers for the Cognitive Actions API
}

# Prepare the request body for the hypothetical execution endpoint
request_body = {
    "action_id": action_id,
    "inputs": payload
}

print(f"--- Calling Cognitive Action: {action.name or action_id} ---")
print(f"Endpoint: {COGNITIVE_ACTIONS_EXECUTE_URL}")
print(f"Action ID: {action_id}")
print("Payload being sent:")
print(json.dumps(request_body, indent=2))
print("------------------------------------------------")

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json=request_body
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully. Result:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body (non-JSON): {e.response.text}")
    print("------------------------------------------------")

Conclusion

In summary, the Multilingual E5 Base's ability to generate multilingual text embeddings offers developers a robust solution for handling diverse linguistic data effectively. With its focus on speed, normalization, and consistency, this action is invaluable for applications that rely on accurate semantic understanding of text across multiple languages.

As you integrate this capability into your projects, consider how it can enhance user experiences and broaden the reach of your applications. Start experimenting with the Multilingual E5 Base today to unlock the full potential of multilingual processing in your applications!