Enhance Content Moderation with Llama-Guard Cognitive Actions

23 Apr 2025
Enhance Content Moderation with Llama-Guard Cognitive Actions

In today's digital landscape, ensuring safe and appropriate interactions is crucial for any application that utilizes user-generated content. The Llama-Guard Cognitive Actions, powered by a 7 billion parameter Llama 2-based model, offer developers a robust solution for classifying content safety. These pre-built actions simplify the moderation process by evaluating both user prompts and assistant responses, categorizing content as safe or unsafe.

Prerequisites

Before integrating the Llama-Guard Cognitive Actions into your application, you will need to ensure you have the following:

  • API Key: Obtain an API key to access the Cognitive Actions platform.
  • Basic Setup: Familiarize yourself with how to make HTTP requests and handle JSON data.

Authentication typically involves passing your API key in the headers of your requests, allowing secure access to the content moderation functionalities.

Cognitive Actions Overview

Classify Content Safety with Llama-Guard

The Classify Content Safety with Llama-Guard action is designed to evaluate user prompts and assistant messages to determine their safety. It identifies whether the content is safe and categorizes any unsafe content into specific violations, making it an essential tool for content moderation.

  • Category: Content Moderation

Input

The action requires a structured input comprising:

  • userPrompt: A string representing the user's prompt or query that needs moderation. Example: "I forgot how to kill a process in Linux, can you help?"
  • assistantMessage: (Optional) A string containing the message generated by the assistant that requires moderation.

Example Input:

{
  "userPrompt": "I forgot how to kill a process in Linux, can you help?"
}

Output

The action will return a string indicating the safety status of the content. The possible output is:

  • "safe": Indicates that the content is appropriate and free from violations.

Example Output:

"safe"

Conceptual Usage Example (Python)

Here’s a conceptual example of how you might execute the Classify Content Safety with Llama-Guard action using Python:

import requests
import json

# Replace with your Cognitive Actions API key and endpoint
COGNITIVE_ACTIONS_API_KEY = "YOUR_COGNITIVE_ACTIONS_API_KEY"
COGNITIVE_ACTIONS_EXECUTE_URL = "https://api.cognitiveactions.com/actions/execute" # Hypothetical endpoint

action_id = "56b4469f-78b0-4cad-a712-a486f7dfbc3c" # Action ID for Classify Content Safety with Llama-Guard

# Construct the input payload based on the action's requirements
payload = {
    "userPrompt": "I forgot how to kill a process in Linux, can you help?"
}

headers = {
    "Authorization": f"Bearer {COGNITIVE_ACTIONS_API_KEY}",
    "Content-Type": "application/json"
}

try:
    response = requests.post(
        COGNITIVE_ACTIONS_EXECUTE_URL,
        headers=headers,
        json={"action_id": action_id, "inputs": payload} # Hypothetical structure
    )
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    result = response.json()
    print("Action executed successfully:")
    print(json.dumps(result, indent=2))

except requests.exceptions.RequestException as e:
    print(f"Error executing action {action_id}: {e}")
    if e.response is not None:
        print(f"Response status: {e.response.status_code}")
        try:
            print(f"Response body: {e.response.json()}")
        except json.JSONDecodeError:
            print(f"Response body: {e.response.text}")

In this code snippet, replace "YOUR_COGNITIVE_ACTIONS_API_KEY" with your actual API key. The action ID is set to the corresponding ID for the Llama-Guard action, and the input payload is structured to include the user prompt.

Conclusion

The Llama-Guard Cognitive Actions provide a powerful and efficient means to ensure content safety in your applications. By leveraging these pre-built actions, developers can streamline the moderation processes, significantly enhancing user experience and safety. Next steps could include integrating additional actions or exploring further capabilities offered by the Cognitive Actions platform.