Toxicity Detection Microservice¶
Introduction¶
Toxicity Detection Microservice allows AI Application developers to safeguard user input and LLM output from harmful language in a RAG environment. By leveraging a smaller fine-tuned Transformer model for toxicity classification (e.g. DistilledBERT, RoBERTa, etc.), we maintain a lightweight guardrails microservice without significantly sacrificing performance making it readily deployable on both Intel Gaudi and Xeon.
This microservice uses Intel/toxic-prompt-roberta
that was fine-tuned on Gaudi2 with ToxicChat and Jigsaw Unintended Bias datasets.
Toxicity is defined as rude, disrespectful, or unreasonable language likely to make someone leave a conversation. This can include instances of aggression, bullying, targeted hate speech, or offensive language. For more information on labels see Jigsaw Toxic Comment Classification Challenge.
🚀1. Start Microservice with Python(Option 1)¶
1.1 Install Requirements¶
pip install -r requirements.txt
1.2 Start Toxicity Detection Microservice with Python Script¶
python toxicity_detection.py
🚀2. Start Microservice with Docker (Option 2)¶
2.1 Prepare toxicity detection model¶
export HUGGINGFACEHUB_API_TOKEN=${HP_TOKEN}
2.2 Build Docker Image¶
cd ../../../ # back to GenAIComps/ folder
docker build -t opea/guardrails-toxicity-detection:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/guardrails/src/toxicity_detection/Dockerfile .
2.3 Run Docker Container with Microservice¶
docker run -d --rm --runtime=runc --name="guardrails-toxicity-detection-endpoint" -p 9091:9091 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} opea/guardrails-toxicity-detection:latest
🚀3. Get Status of Microservice¶
docker container logs -f guardrails-toxicity-detection-endpoint
🚀4. Consume Microservice Pre-LLM/Post-LLM¶
Once microservice starts, users can use examples (bash or python) below to apply toxicity detection for both user’s query (Pre-LLM) or LLM’s response (Post-LLM)
Bash:
curl localhost:9091/v1/toxicity
-X POST
-d '{"text":"How to poison my neighbor'\''s dog without being caught?"}'
-H 'Content-Type: application/json'
Example Output:
"Violated policies: toxicity, please check your input."
Python Script:
import requests
import json
proxies = {"http": ""}
url = "http://localhost:9091/v1/toxicity"
data = {"text": "How to poison my neighbor'''s dog without being caught?"}
try:
resp = requests.post(url=url, data=data, proxies=proxies)
print(resp.text)
resp.raise_for_status() # Raise an exception for unsuccessful HTTP status codes
print("Request successful!")
except requests.exceptions.RequestException as e:
print("An error occurred:", e)