# PII Detection Microservice

## Introduction

In today's digital landscape, safeguarding personal information has become paramount, necessitating robust mechanisms to detect and protect personally identifiable information (PII). PII detection guardrails serve as essential tools in this endeavor, providing automated systems and protocols designed to identify, manage, and secure sensitive data. These guardrails leverage classical machine learning, LLMs, natural language processing (NLP) algorithms, and pattern recognition to accurately pinpoint PII ensuring compliance with privacy regulations and minimizing the risk of data breaches. By implementing PII detection guardrails, organizations can enhance their data protection strategies, foster trust with stakeholders, and uphold the integrity of personal information.

This component currently supports two microservices: an OPEA native (free, local, open-source) microservice and a Prediction Guard (API Key required) microservice. Please choose one of the two microservices for PII detection based on your specific use case. If you wish to run both for experimental or comparison purposes, make sure to modify the port configuration of one service to avoid conflicts, as they are configured to use the same port by default.

### PII Detection Microservice

This service uses a [SpaCy](https://spacy.io/) pipeline that is built by a [Microsoft Presidio](https://microsoft.github.io/presidio/) Transformers Nlp Engine. The pipeline contains standard NLP and regex-based recognizers that detect the entities listed [here](https://microsoft.github.io/presidio/supported_entities/) as well as a BERT-based model ([`StanfordAIMI/stanford-deidentifier-base`](https://huggingface.co/StanfordAIMI/stanford-deidentifier-base)) that additionally detects the following PII entities:

- Person
- Location
- Organization
- Age
- ID
- Email
- Date/time
- Phone number
- Nationality/religious/political group

The service takes text as input (TextDoc) and returns either the original text (TextDoc) if no PII is detected, or a list of detected entities (PIIResponseDoc) with the detection details including detection score (probability), detection method and start/end string indices of detection.

Stay tuned for the following future work:

- Entity configurability
- PII replacement

### Prediction Guard PII Detection Microservice

[Prediction Guard](https://docs.predictionguard.com) allows you to utilize hosted open access LLMs, LVMs, and embedding functionality with seamlessly integrated safeguards. In addition to providing a scalable access to open models, Prediction Guard allows you to configure factual consistency checks, toxicity filters, PII filters, and prompt injection blocking. Join the [Prediction Guard Discord channel](https://discord.gg/TFHgnhAFKd) and request an API key to get started.

Detecting Personal Identifiable Information (PII) is important in ensuring that users aren't sending out private data to LLMs. This service allows you to configurably:

1. Detect PII
2. Replace PII (with "faked" information)
3. Mask PII (with placeholders)

## Environment Setup

### Clone OPEA GenAIComps and Setup Environment

Clone this repository at your desired location and set an environment variable for easy setup and usage throughout the instructions.

```bash
git clone https://github.com/opea-project/GenAIComps.git

export OPEA_GENAICOMPS_ROOT=$(pwd)/GenAIComps
```

Set the port that this service will use and the component name

```bash
export PII_DETECTION_PORT=9080
export PII_DETECTION_COMPONENT_NAME="OPEA_NATIVE_PII"
```

By default, this microservice uses `OPEA_NATIVE_PII` which uses [`Microsoft Presidio`](https://microsoft.github.io/presidio/) to locally invoke [`StanfordAIMI/stanford-deidentifier-base`](https://huggingface.co/StanfordAIMI/stanford-deidentifier-base) within a Transformers-based AnalyzerEngine.

#### Alternatively, if you are using Prediction Guard, set the following component name and Prediction Guard API Key:

```bash
export PII_DETECTION_COMPONENT_NAME="PREDICTIONGUARD_PII_DETECTION"
export PREDICTIONGUARD_API_KEY=${your_predictionguard_api_key}
```

## 🚀1. Start Microservice with Python（Option 1）

### 1.1 Install Requirements

```bash
cd $OPEA_GENAICOMPS_ROOT/comps/guardrails/src/pii_detection
pip install -r requirements.txt
```

### 1.2 Start PII Detection Microservice with Python Script

```bash
python opea_pii_detection_microservice.py
```

## 🚀2. Start Microservice with Docker (Option 2)

### For native OPEA Microservice

#### 2.1 Build Docker Image

```bash
cd $OPEA_GENAICOMPS_ROOT
docker build \
  --build-arg https_proxy=$https_proxy \
  --build-arg http_proxy=$http_proxy \
  -t opea/guardrails-pii-detection:latest \
  -f comps/guardrails/src/pii_detection/Dockerfile .
```

#### 2.2.a Run Docker with Compose (Option A)

```bash
cd $OPEA_GENAICOMPS_ROOT/comps/guardrails/deployment/docker_compose
docker compose up -d guardrails-pii-detection-server
```

#### 2.2.b Run Docker with CLI (Option B)

```bash
docker run -d --rm \
    --name="guardrails-pii-detection-server" \
    -p ${PII_DETECTION_PORT}:9080 \
    --ipc=host \
    -e http_proxy=$http_proxy \
    -e https_proxy=$https_proxy \
    -e no_proxy=${no_proxy} \
     opea/guardrails-toxicity-detection:latest
```

### For Prediction Guard Microservice

#### 2.1 Build Docker Image

```bash
cd $OPEA_GENAICOMPS_ROOT
docker build \
  --build-arg https_proxy=$https_proxy \
  --build-arg http_proxy=$http_proxy \
  -t opea/guardrails-pii-predictionguard:latest \
  -f comps/guardrails/src/pii_detection/Dockerfile .
```

#### 2.2.a Run Docker with Compose (Option A)

```bash
cd $OPEA_GENAICOMPS_ROOT/comps/guardrails/deployment/docker_compose
docker compose up -d pii-predictionguard-server
```

#### 2.2.b Run Docker with CLI (Option B)

```bash
docker run -d \
  --name="pii-predictionguard-server" \
  -p ${PII_DETECTION_PORT}:9080 \
  -e PREDICTIONGUARD_API_KEY=$PREDICTIONGUARD_API_KEY \
  -e PII_DETECTION_COMPONENT_NAME: ${PREDICTIONGUARD_PII_DETECTION}
  opea/guardrails-pii-predictionguard:latest
```

## 🚀3. Get Status of Microservice

### For native OPEA Microservice

```bash
docker container logs -f guardrails-pii-detection-server
```

### For Prediction Guard Microservice

```bash
docker container logs -f  pii-predictionguard-server
```

## 🚀4. Consume Microservice

Once microservice starts, users can use examples (bash or python) below to apply PII detection

### For native OPEA Microservice

**Bash Example**:

```bash
curl localhost:${PII_DETECTION_PORT}/v1/pii \
    -X POST \
    -d '{"text":"My name is John Doe and my phone number is (555) 555-5555."}' \
    -H 'Content-Type: application/json'
```

**Python Example:**

```python
import requests
import json
import os

pii_detection_port = os.getenv("PII_DETECTION_PORT")
proxies = {"http": ""}
url = f"http://localhost:{pii_detection_port}/v1/pii"
data = {"text": "My name is John Doe and my phone number is (555) 555-5555."}


try:
    resp = requests.post(url=url, data=data, proxies=proxies)
    print(json.dumps(json.loads(resp.text), indent=4))
    resp.raise_for_status()  # Raise an exception for unsuccessful HTTP status codes
    print("Request successful!")
except requests.exceptions.RequestException as e:
    print("An error occurred:", e)
```

**Example Output**:

```json
{
  "id": "4631406f5f91728e45ad27eba062bb4b",
  "detected_pii": [
    {
      "entity_type": "PHONE_NUMBER",
      "start": 44,
      "end": 58,
      "score": 0.9992861151695251,
      "analysis_explanation": null,
      "recognition_metadata": {
        "recognizer_name": "TransformersRecognizer",
        "recognizer_identifier": "TransformersRecognizer_140427422846672"
      }
    },
    {
      "entity_type": "PERSON",
      "start": 12,
      "end": 20,
      "score": 0.8511614799499512,
      "analysis_explanation": null,
      "recognition_metadata": {
        "recognizer_name": "TransformersRecognizer",
        "recognizer_identifier": "TransformersRecognizer_140427422846672"
      }
    }
  ],
  "new_prompt": null
}
```

### For Prediction Guard Microservice

```bash
curl -X POST http://localhost:${PII_DETECTION_PORT}/v1/pii \
    -H 'Content-Type: application/json' \
    -d '{
      "prompt": "My name is John Doe and my phone number is (555) 555-5555.",
      "replace": true,
      "replace_method": "random"
      }'
```

API parameters for Prediction Guard microservice:

- `prompt` (string, required): The text in which you want to detect PII (typically the prompt that you anticipate sending to an LLM)
- `replace` (boolean, optional, default is `false`): `true` if you want to replace the detected PII in the `prompt`
- `replace_method` (string, optional, default is `random`): The method you want to use to replace PII (set to either `random`, `fake`, `category`, `mask`)