# FaqGen Accuracy

## Dataset

We evaluate performance on QA dataset [Squad_v2](https://huggingface.co/datasets/rajpurkar/squad_v2). Generate FAQs on "context" columns in validation dataset, which contains 1204 unique records.

First download dataset and put at "./data".

Extract unique "context" columns, which will be save to 'data/sqv2_context.json':

```
python get_context.py
```

## Generate FAQs

### Launch FaQGen microservice

Please refer to [FaQGen microservice](/GenAIComps/comps/llms/src/faq-generation/README.md), set up an microservice endpoint.

```
export FAQ_ENDPOINT = "http://${your_ip}:9000/v1/faqgen"
```

### Generate FAQs with microservice

Use the microservice endpoint to generate FAQs for dataset.

```
python generate_FAQ.py
```

Post-process the output to get the right data, which will be save to 'data/sqv2_faq.json'.

```
python post_process_FAQ.py
```

## Evaluate with Ragas

### Launch TGI service

We use "mistralai/Mixtral-8x7B-Instruct-v0.1" as LLM referee to evaluate the model. First we need to launch a LLM endpoint on Gaudi.

```
export HF_TOKEN="your_huggingface_token"
bash launch_tgi.sh
```

Get the endpoint:

```
export LLM_ENDPOINT = "http://${ip_address}:8082"
```

Verify the service:

```bash
curl http://${ip_address}:8082/generate \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":128}}' \
    -H 'Content-Type: application/json'
```

### Evaluate

evaluate the performance with the LLM:

```
python evaluate.py
```

### Performance Result

Here is the tested result for your reference
| answer_relevancy | faithfulness | context_utilization | reference_free_rubrics_score |
| ---- | ---- |---- |---- |
| 0.7191 | 0.9681 | 0.8964 | 4.4125|