# AudioQnA Accuracy

AudioQnA is an example that demonstrates the integration of Generative AI (GenAI) models for performing question-answering (QnA) on audio scene, which contains Automatic Speech Recognition (ASR) and Text-to-Speech (TTS). The following is the piepline for evaluating the ASR accuracy.

## Dataset

We evaluate the ASR accuracy on the test set of librispeech [dataset](https://huggingface.co/datasets/andreagasparini/librispeech_test_only), which contains 2620 records of audio and texts.

## Metrics

We evaluate the WER (Word Error Rate) metric of the ASR microservice.

## Evaluation

### Launch ASR microservice

Launch the ASR microserice with the following commands. For more details please refer to [doc](/GenAIComps/comps/asr/src/README.md).

```bash
git clone https://github.com/opea-project/GenAIComps
cd GenAIComps
docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/Dockerfile .
# change the name of model by editing model_name_or_path you want to evaluate
docker run -p 7066:7066 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy opea/whisper:latest --model_name_or_path "openai/whisper-tiny"
```

### Evaluate

Install dependencies:

```
pip install -r requirements.txt
```

Evaluate the performance with the LLM:

```py
# validate the offline model
# python offline_eval.py
# validate the online asr microservice accuracy
python online_eval.py
```

### Performance Result

Here is the tested result for your reference
|| WER |
| --- | ---- |
|whisper-large-v2| 2.87|
|whisper-large| 2.7 |
|whisper-medium| 3.45 |