PII Detection Microservice¶

Table of Contents¶

Introduction
Environment Setup
Start Microservice
Get Status of Microservice
Consume Microservice

Introduction¶

In today’s digital landscape, safeguarding personal information has become paramount, necessitating robust mechanisms to detect and protect personally identifiable information (PII). PII detection guardrails serve as essential tools in this endeavor, providing automated systems and protocols designed to identify, manage, and secure sensitive data. These guardrails leverage classical machine learning, LLMs, natural language processing (NLP) algorithms, and pattern recognition to accurately pinpoint PII ensuring compliance with privacy regulations and minimizing the risk of data breaches. By implementing PII detection guardrails, organizations can enhance their data protection strategies, foster trust with stakeholders, and uphold the integrity of personal information.

This component currently supports two microservices: an OPEA native (free, local, open-source) microservice and a Prediction Guard (API Key required) microservice. Please choose one of the two microservices for PII detection based on your specific use case. If you wish to run both for experimental or comparison purposes, make sure to modify the port configuration of one service to avoid conflicts, as they are configured to use the same port by default.

PII Detection Microservice¶

This service uses a SpaCy pipeline that is built by a Microsoft Presidio Transformers Nlp Engine. The pipeline contains standard NLP and regex-based recognizers that detect the entities listed here as well as a BERT-based model (StanfordAIMI/stanford-deidentifier-base) that additionally detects the following PII entities:

Person
Location
Organization
Age
ID
Email
Date/time
Phone number
Nationality/religious/political group

The service takes text as input (TextDoc) and returns either the original text (TextDoc) if no PII is detected, or a list of detected entities (PIIResponseDoc) with the detection details including detection score (probability), detection method and start/end string indices of detection.

Stay tuned for the following future work:

Entity configurability
PII replacement

Prediction Guard PII Detection Microservice¶

Prediction Guard allows you to utilize hosted open access LLMs, LVMs, and embedding functionality with seamlessly integrated safeguards. In addition to providing a scalable access to open models, Prediction Guard allows you to configure factual consistency checks, toxicity filters, PII filters, and prompt injection blocking. Join the Prediction Guard Discord channel and request an API key to get started.

Detecting Personal Identifiable Information (PII) is important in ensuring that users aren’t sending out private data to LLMs. This service allows you to configurably:

Detect PII
Replace PII (with “faked” information)
Mask PII (with placeholders)

Environment Setup¶

Clone OPEA GenAIComps and Setup Environment¶

Clone this repository at your desired location and set an environment variable for easy setup and usage throughout the instructions.

git clone https://github.com/opea-project/GenAIComps.git

export OPEA_GENAICOMPS_ROOT=$(pwd)/GenAIComps

Set the port that this service will use and the component name

export PII_DETECTION_PORT=9080
export PII_DETECTION_COMPONENT_NAME="OPEA_NATIVE_PII"

By default, this microservice uses OPEA_NATIVE_PII which uses Microsoft Presidio to locally invoke StanfordAIMI/stanford-deidentifier-base within a Transformers-based AnalyzerEngine.

Alternatively, if you are using Prediction Guard, set the following component name and Prediction Guard API Key:¶

export PII_DETECTION_COMPONENT_NAME="PREDICTIONGUARD_PII_DETECTION"
export PREDICTIONGUARD_API_KEY=${your_predictionguard_api_key}

Start Microservice¶

Start Microservice with Python（Option 1）¶

Install Requirements¶

cd $OPEA_GENAICOMPS_ROOT/comps/guardrails/src/pii_detection
pip install -r requirements.txt

Start PII Detection Microservice with Python Script¶

python opea_pii_detection_microservice.py

Start Microservice with Docker (Option 2)¶

For native OPEA Microservice¶

Build Docker Image¶

cd $OPEA_GENAICOMPS_ROOT
docker build \
  --build-arg https_proxy=$https_proxy \
  --build-arg http_proxy=$http_proxy \
  -t opea/guardrails-pii-detection:latest \
  -f comps/guardrails/src/pii_detection/Dockerfile .

Run Docker with Compose (Option A)¶

cd $OPEA_GENAICOMPS_ROOT/comps/guardrails/deployment/docker_compose
docker compose up -d guardrails-pii-detection-server

Run Docker with CLI (Option B)¶

docker run -d --rm \
    --name="guardrails-pii-detection-server" \
    -p ${PII_DETECTION_PORT}:9080 \
    --ipc=host \
    -e http_proxy=$http_proxy \
    -e https_proxy=$https_proxy \
    -e no_proxy=${no_proxy} \
     opea/guardrails-toxicity-detection:latest

For Prediction Guard Microservice¶

Build Docker Image¶

cd $OPEA_GENAICOMPS_ROOT
docker build \
  --build-arg https_proxy=$https_proxy \
  --build-arg http_proxy=$http_proxy \
  -t opea/guardrails-pii-predictionguard:latest \
  -f comps/guardrails/src/pii_detection/Dockerfile .

Run Docker with Compose (Option A)¶

cd $OPEA_GENAICOMPS_ROOT/comps/guardrails/deployment/docker_compose
docker compose up -d pii-predictionguard-server

Run Docker with CLI (Option B)¶

docker run -d \
  --name="pii-predictionguard-server" \
  -p ${PII_DETECTION_PORT}:9080 \
  -e PREDICTIONGUARD_API_KEY=$PREDICTIONGUARD_API_KEY \
  -e PII_DETECTION_COMPONENT_NAME: ${PREDICTIONGUARD_PII_DETECTION}
  opea/guardrails-pii-predictionguard:latest

Get Status of Microservice¶

For native OPEA Microservice¶

docker container logs -f guardrails-pii-detection-server

For Prediction Guard Microservice¶

docker container logs -f  pii-predictionguard-server

Consume Microservice¶

Once microservice starts, users can use examples (bash or python) below to apply PII detection

For native OPEA Microservice¶

Bash Example:

curl localhost:${PII_DETECTION_PORT}/v1/pii \
    -X POST \
    -d '{"text":"My name is John Doe and my phone number is (555) 555-5555."}' \
    -H 'Content-Type: application/json'

Python Example:

import requests
import json
import os

pii_detection_port = os.getenv("PII_DETECTION_PORT")
proxies = {"http": ""}
url = f"http://localhost:{pii_detection_port}/v1/pii"
data = {"text": "My name is John Doe and my phone number is (555) 555-5555."}


try:
    resp = requests.post(url=url, data=data, proxies=proxies)
    print(json.dumps(json.loads(resp.text), indent=4))
    resp.raise_for_status()  # Raise an exception for unsuccessful HTTP status codes
    print("Request successful!")
except requests.exceptions.RequestException as e:
    print("An error occurred:", e)

Example Output:

{
  "id": "4631406f5f91728e45ad27eba062bb4b",
  "detected_pii": [
    {
      "entity_type": "PHONE_NUMBER",
      "start": 44,
      "end": 58,
      "score": 0.9992861151695251,
      "analysis_explanation": null,
      "recognition_metadata": {
        "recognizer_name": "TransformersRecognizer",
        "recognizer_identifier": "TransformersRecognizer_140427422846672"
      }
    },
    {
      "entity_type": "PERSON",
      "start": 12,
      "end": 20,
      "score": 0.8511614799499512,
      "analysis_explanation": null,
      "recognition_metadata": {
        "recognizer_name": "TransformersRecognizer",
        "recognizer_identifier": "TransformersRecognizer_140427422846672"
      }
    }
  ],
  "new_prompt": null
}

For Prediction Guard Microservice¶

curl -X POST http://localhost:${PII_DETECTION_PORT}/v1/pii \
    -H 'Content-Type: application/json' \
    -d '{
      "prompt": "My name is John Doe and my phone number is (555) 555-5555.",
      "replace": true,
      "replace_method": "random"
      }'

API parameters for Prediction Guard microservice:

prompt (string, required): The text in which you want to detect PII (typically the prompt that you anticipate sending to an LLM)
replace (boolean, optional, default is false): true if you want to replace the detected PII in the prompt
replace_method (string, optional, default is random): The method you want to use to replace PII (set to either random, fake, category, mask)