Generate Multilingual Text Embeddings Efficiently with AI

27 Apr 2025
Generate Multilingual Text Embeddings Efficiently with AI

In today's globalized world, the ability to process and understand text in multiple languages is crucial for developers looking to create inclusive applications. The Multilingual E5 Large model provides powerful Cognitive Actions for embedding multilingual texts, enabling your applications to handle diverse linguistic data seamlessly. By leveraging this service, developers can generate high-quality embeddings that capture the nuances of different languages, leading to better performance in tasks such as search, recommendation, and sentiment analysis.

The key benefits of using the Multilingual E5 Large model include speed, accuracy, and the ability to handle large volumes of text through batch processing. Common use cases for these embeddings include enhancing the capabilities of chatbots, improving content recommendations based on user preferences, and facilitating cross-lingual information retrieval.

Embed Multilingual Texts

The "Embed Multilingual Texts" action generates multi-language text embeddings using the Multilingual E5 Large model. This action addresses the challenge of creating meaningful vector representations of text that can be used for various downstream tasks, such as similarity search and classification.

Input Requirements

To use this action, you need to provide the following inputs:

  • texts: A JSON-formatted list of strings representing the text entries to be embedded. For example: ["In the water, fish are swimming.", "Fish swim in the water.", "A book lies open on the table."].
  • batchSize: An integer that specifies the number of text entries to process in a single batch. The default value is 32.
  • normalizeEmbeddings: A boolean indicating whether the embeddings should be normalized. The default is set to true.

Expected Output

The expected output is a list of numeric arrays, where each array represents the embedding for the corresponding input text. For example:

[
  [0.001451187883503735, -0.0232482198625803, ...],
  [0.03411397337913513, -0.019445784389972687, ...]
]

Use Cases for this Specific Action

This action is particularly useful in several scenarios:

  • Cross-language Search: Improve search functionalities by embedding documents in multiple languages, allowing users to search in their preferred language and find relevant results.
  • Multilingual Chatbots: Enhance the performance of chatbots that need to understand and respond to users in different languages, providing a better user experience.
  • Content Recommendation Systems: Use embeddings to analyze user preferences across languages, enabling personalized recommendations regardless of the user's language.
import requests
import json

# Replace with your actual Cognitive Actions API key and endpoint
# Ensure your environment securely handles the API key
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
# This endpoint URL is hypothetical and should be documented for users
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute"

action_id = "31782e19-fbda-496b-a49e-697d975d3dd4" # Action ID for: Embed Multilingual Texts

# Construct the exact input payload based on the action's requirements
# This example uses the predefined example_input for this action:
payload = {
  "texts": "[\"In the water, fish are swimming.\", \"Fish swim in the water.\", \"A book lies open on the table.\"]",
  "batchSize": 32,
  "normalizeEmbeddings": true
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json",
    # Add any other required headers for the Cognitive Actions API
}

# Prepare the request body for the hypothetical execution endpoint
request_body = {
    "action_id": action_id,
    "inputs": payload
}

print(f"--- Calling Cognitive Action: {action.name or action_id} ---")
print(f"Endpoint: {COGNITIVE_ACTIONS_EXECUTE_URL}")
print(f"Action ID: {action_id}")
print("Payload being sent:")
print(json.dumps(request_body, indent=2))
print("------------------------------------------------")

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json=request_body
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully. Result:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body (non-JSON): {e.response.text}")
    print("------------------------------------------------")

In conclusion, the Multilingual E5 Large model's ability to generate high-quality multilingual text embeddings opens up a wealth of possibilities for developers. By incorporating these embeddings into your applications, you can significantly enhance the understanding and processing of multilingual content, ultimately leading to more engaging and effective user experiences. To get started, ensure you have your Cognitive Actions API key and familiarize yourself with basic API call structures to integrate this functionality into your projects.