Evaluate Content Safety with the Llama Guard 2 Cognitive Actions

22 Apr 2025
Evaluate Content Safety with the Llama Guard 2 Cognitive Actions

In the rapidly evolving landscape of AI and machine learning, ensuring the safety of generated content is paramount. The meta/meta-llama-guard-2-8b offers a powerful Cognitive Action called Classify Content Safety with Llama Guard 2. This action leverages an advanced model to classify content based on safety categories, helping developers implement robust content moderation in their applications. By utilizing this pre-built action, you can enhance your application's ability to evaluate and filter unsafe content effectively.

Prerequisites

Before you start integrating the Llama Guard 2 Cognitive Action, ensure you have the following:

  • An API key for the Cognitive Actions platform.
  • Basic knowledge of how to make HTTP requests through your preferred programming language (we'll use Python for our examples).

Authentication typically involves passing the API key in the request headers to access the actions securely.

Cognitive Actions Overview

Classify Content Safety with Llama Guard 2

Purpose:
This action evaluates the safety of both user inputs and assistant outputs, categorizing them using the MLCommons taxonomy. It is particularly useful for applications that require content moderation to filter out unsafe content related to violent crimes, privacy violations, intellectual property issues, and more.

Category: content-moderation

Input:

The input for this action must conform to the following schema:

{
  "prompt": "I forgot how to kill a process in Linux, can you help?",
  "assistant": "Sure! To kill a process in Linux, you can use the kill command followed by the process ID (PID) of the process you want to terminate."
}
  • prompt (string): The text provided by the user, typically a question or command.
    Example: "I forgot how to kill a process in Linux, can you help?"
  • assistant (string): The response generated by the assistant in reply to the user's prompt.
    Example: "Sure! To kill a process in Linux, you can use the kill command followed by the process ID (PID) of the process you want to terminate."

Output:

The action returns a string indicating the safety classification of the input content. The possible output for a safe classification is:

"safe"

Conceptual Usage Example (Python):

Here’s how you might call this action using Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "df0c4656-5b00-45ec-a964-e67d48b6666e" # Action ID for Classify Content Safety with Llama Guard 2

# Construct the input payload based on the action's requirements
payload = {
    "prompt": "I forgot how to kill a process in Linux, can you help?",
    "assistant": "Sure! To kill a process in Linux, you can use the kill command followed by the process ID (PID) of the process you want to terminate."
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet, replace the placeholder API key and endpoint with your actual values. The action ID is specified for classifying content safety, and the input JSON payload is structured as required by the action. The response will indicate whether the content is classified as safe or not.

Conclusion

Integrating the Classify Content Safety with Llama Guard 2 action into your application can significantly enhance its ability to moderate content effectively. By leveraging this powerful Cognitive Action, developers can ensure a safer user experience by proactively filtering out harmful content. Consider exploring further use cases or combining this action with other Cognitive Actions to create a comprehensive moderation solution.