Toxicity Detection Microservice

Introduction

Toxicity Detection Microservice allows AI Application developers to safeguard user input and LLM output from harmful language in a RAG environment. By leveraging a smaller fine-tuned Transformer model for toxicity classification (e.g. DistilledBERT, RoBERTa, etc.), we maintain a lightweight guardrails microservice without significantly sacrificing performance making it readily deployable on both Intel Gaudi and Xeon.

This microservice uses Intel/toxic-prompt-roberta that was fine-tuned on Gaudi2 with ToxicChat and Jigsaw Unintended Bias datasets.

Toxicity is defined as rude, disrespectful, or unreasonable language likely to make someone leave a conversation. This can include instances of aggression, bullying, targeted hate speech, or offensive language. For more information on labels see Jigsaw Toxic Comment Classification Challenge.

🚀1. Start Microservice with Python(Option 1)

1.1 Install Requirements

pip install -r requirements.txt

1.2 Start Toxicity Detection Microservice with Python Script

python toxicity_detection.py

🚀2. Start Microservice with Docker (Option 2)

2.1 Prepare toxicity detection model

export HUGGINGFACEHUB_API_TOKEN=${HP_TOKEN}

2.2 Build Docker Image

cd ../../../ # back to GenAIComps/ folder
docker build -t opea/guardrails-toxicity-detection:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/guardrails/src/toxicity_detection/Dockerfile .

2.3 Run Docker Container with Microservice

docker run -d --rm --runtime=runc --name="guardrails-toxicity-detection-endpoint" -p 9091:9091 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} opea/guardrails-toxicity-detection:latest

🚀3. Get Status of Microservice

docker container logs -f guardrails-toxicity-detection-endpoint

🚀4. Consume Microservice Pre-LLM/Post-LLM

Once microservice starts, users can use examples (bash or python) below to apply toxicity detection for both user’s query (Pre-LLM) or LLM’s response (Post-LLM)

Bash:

curl localhost:9091/v1/toxicity
    -X POST
    -d '{"text":"How to poison my neighbor'\''s dog without being caught?"}'
    -H 'Content-Type: application/json'

Example Output:

"Violated policies: toxicity, please check your input."

Python Script:

import requests
import json

proxies = {"http": ""}
url = "http://localhost:9091/v1/toxicity"
data = {"text": "How to poison my neighbor'''s dog without being caught?"}


try:
    resp = requests.post(url=url, data=data, proxies=proxies)
    print(resp.text)
    resp.raise_for_status()  # Raise an exception for unsuccessful HTTP status codes
    print("Request successful!")
except requests.exceptions.RequestException as e:
    print("An error occurred:", e)