# Deploying MultimodalQnA on Intel® Xeon® Processors

This document outlines the deployment process for a MultimodalQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `multimodal_embedding` that employs [BridgeTower](https://huggingface.co/BridgeTower/bridgetower-large-itm-mlm-gaudi) model as embedding model, `multimodal_retriever`, `lvm`, and `multimodal-data-prep`.

## Table of Contents

1. [MultimodalQnA Quick Start Deployment](#multimodalqna-quick-start-deployment)
2. [MultimodalQnA Docker Compose Files](#multimodalqna-docker-compose-files)
3. [Validate Microservices](#validate-microservices)
4. [Conclusion](#conclusion)

## MultimodalQnA Quick Start Deployment

This section describes how to quickly deploy and test the MultimodalQnA service manually on an Intel® Xeon® processor. The basic steps are:

1. [Access the Code](#access-the-code)
2. [Configure the Deployment Environment](#configure-the-deployment-environment)
3. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
4. [Check the Deployment Status](#check-the-deployment-status)
5. [Validate the Pipeline](#validate-the-pipeline)
6. [Cleanup the Deployment](#cleanup-the-deployment)

### Access the Code

Clone the GenAIExamples repository and access the MultimodalQnA Docker Compose files and supporting scripts:

```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/MultimodalQnA
```

Then checkout a released version, such as v1.3:

```bash
git checkout v1.3
```

### Configure the Deployment Environment

Before configuring environment variables, ensure you have a suitable Intel Xeon server instance ready for deployment. For example, if you are deploying on AWS, create an AWS account and launch an EC2 instance with an Intel Xeon processor. Recommended instance types include M7i or M7i-flex, which are optimized for 4th Gen Intel Xeon Scalable processors.

Refer to [AWS M7i instance documentation](https://aws.amazon.com/ec2/instance-types/m7i/) for more details.

Make sure to open the following ports in your EC2 security group so the microservices can communicate properly:

| Service                          | Ports (open to 0.0.0.0/0) |
| -------------------------------- | ------------------------- |
| redis-vector-db                  | 6379, 8001                |
| embedding-multimodal-bridgetower | 6006                      |
| embedding                        | 6000                      |
| retriever-multimodal-redis       | 7000                      |
| lvm-llava                        | 8399                      |
| lvm                              | 9399                      |
| whisper                          | 7066                      |
| speecht5-service                 | 7055                      |
| dataprep-multimodal-redis        | 6007                      |
| multimodalqna                    | 8888                      |
| multimodalqna-ui                 | 5173                      |

After the server setup and network configuration, proceed with setting environment variables specific to the deployment environment and source the `set_env.sh` script in this directory:

```bash
export host_ip="External_Public_IP"           # ip address of the node
export HF_TOKEN="Your_HuggingFace_API_Token"
export http_proxy="Your_HTTP_Proxy"           # http proxy if any
export https_proxy="Your_HTTPs_Proxy"         # https proxy if any
export no_proxy=localhost,127.0.0.1,$host_ip  # additional no proxies if needed
cd docker_compose/intel
source set_env.sh

# For Xeon, update the model environment variable
export LVM_MODEL_ID="llava-hf/llava-1.5-7b-hf"
```

Consult the section on [MultimodalQnA Service configuration](#multimodalqna-docker-compose-files) for information on how service specific configuration parameters affect deployments.

### Deploy the Services Using Docker Compose

To deploy the MultimodalQnA services, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute the command below. It uses the 'compose.yaml' file.

```bash
cd cpu/xeon
docker compose -f compose.yaml up -d
```

Alternatively, to use Milvus vector database instead of Redis:

```bash
export MILVUS_HOST=${host_ip}
export MILVUS_PORT=19530
export MILVUS_RETRIEVER_PORT=7000
export COLLECTION_NAME=LangChainCollection

docker compose -f compose_milvus.yaml up -d
```

### Check the Deployment Status

After running docker compose, check if all the containers launched via docker compose have started:

```bash
docker ps -a
```

For the default deployment, the following 11 containers should have started:

```
| CONTAINER ID | IMAGE                                                   | COMMAND                  | STATUS       | PORTS                                         | NAMES                             |
|--------------|---------------------------------------------------------|--------------------------|--------------|-----------------------------------------------|----------------------------------|
| c1d2e3f4g5h6 | opea/multimodalqna-ui:latest                            | "docker-entrypoint.sh"   | Up 5 minutes | 0.0.0.0:5173->5173/tcp                        | multimodalqna-gradio-ui-server   |
| a1b2c3d4e5f6 | opea/multimodalqna:latest                               | "docker-entrypoint.sh"   | Up 5 minutes | 0.0.0.0:8888->8888/tcp                        | multimodalqna-backend-server     |
| b2c3d4e5f6g7 | opea/lvm:latest                                         | "docker-entrypoint.sh"   | Up 5 minutes | 0.0.0.0:9399->9399/tcp                        | lvm                              |
| d3e4f5g6h7i8 | opea/lvm-llava:latest                                   | "python llava_server.py" | Up 5 minutes | 0.0.0.0:8080->8080/tcp                        | lvm-llava                       |
| e4f5g6h7i8j9 | opea/retriever:latest                                   | "docker-entrypoint.sh"   | Up 5 minutes | 0.0.0.0:7000->7000/tcp                        | retriever-redis                  |
| f5g6h7i8j9k0 | opea/embedding:latest                                   | "docker-entrypoint.sh"   | Up 5 minutes | 0.0.0.0:7061->7061/tcp                        | embedding                       |
| g6h7i8j9k0l1 | opea/embedding-multimodal-bridgetower:latest           | "python bridgetower..."  | Up 5 minutes | 0.0.0.0:7050->7050/tcp                        | embedding-multimodal-bridgetower |
| h7i8j9k0l1m2 | opea/dataprep:latest                                    | "docker-entrypoint.sh"   | Up 5 minutes | 0.0.0.0:6007->5000/tcp                        | dataprep-multimodal-redis       |
| i8j9k0l1m2n3 | redis/redis-stack:7.2.0-v9                              | "redis-stack-server"     | Up 5 minutes | 0.0.0.0:6379->6379/tcp, 8001->8001/tcp        | redis-vector-db                 |
| j9k0l1m2n3o4 | opea/speecht5:latest                                    | "docker-entrypoint.sh"   | Up 5 minutes | 0.0.0.0:7055->7055/tcp                        | speecht5-service                |
| k0l1m2n3o4p5 | opea/whisper:latest                                     | "docker-entrypoint.sh"   | Up 5 minutes | 0.0.0.0:7066->7066/tcp                        | whisper-service                |
```

For the Milvus deployment, the following 12 containers should have started:

```
| CONTAINER ID | IMAGE                                             | COMMAND                        | STATUS        | PORTS                                             | NAMES                          |
|--------------|---------------------------------------------------|--------------------------------|---------------|---------------------------------------------------|--------------------------------|
| 1a2b3c4d5e6f | opea/multimodalqna-ui:latest                      | "docker-entrypoint.sh"          | Up 6 minutes  | 0.0.0.0:5173->5173/tcp                            | multimodalqna-gradio-ui-server |
| 2b3c4d5e6f7g | opea/multimodalqna:latest                         | "docker-entrypoint.sh"          | Up 6 minutes  | 0.0.0.0:8888->8888/tcp                            | multimodalqna-backend-server   |
| 3c4d5e6f7g8h | opea/lvm:latest                                   | "docker-entrypoint.sh"          | Up 6 minutes  | 0.0.0.0:9399->9399/tcp                            | lvm                           |
| 4d5e6f7g8h9i | opea/lvm-llava:latest                             | "python llava_server.py"        | Up 6 minutes  | 0.0.0.0:8080->8080/tcp                            | lvm-llava                     |
| 5e6f7g8h9i0j | opea/retriever:latest                             | "docker-entrypoint.sh"          | Up 6 minutes  | 0.0.0.0:7000->7000/tcp                            | retriever-milvus              |
| 6f7g8h9i0j1k | opea/embedding:latest                             | "docker-entrypoint.sh"          | Up 6 minutes  | 0.0.0.0:7061->7061/tcp                            | embedding                    |
| 7g8h9i0j1k2l | opea/embedding-multimodal-bridgetower:latest     | "python bridgetower_server.py"  | Up 6 minutes  | 0.0.0.0:7050->7050/tcp                            | embedding-multimodal-bridgetower |
| 8h9i0j1k2l3m | opea/dataprep:latest                              | "docker-entrypoint.sh"          | Up 6 minutes  | 0.0.0.0:6007->5000/tcp                            | dataprep-multimodal-milvus    |
| 9i0j1k2l3m4n | quay.io/coreos/etcd:v3.5.5                        | "etcd ..."                     | Up 6 minutes  | 2379/tcp                                          | milvus-etcd                   |
| 0j1k2l3m4n5o | minio/minio:RELEASE.2023-03-20T20-16-18Z          | "minio server ..."             | Up 6 minutes  | 0.0.0.0:5044->9001/tcp, 0.0.0.0:5043->9000/tcp   | milvus-minio                  |
| 1k2l3m4n5o6p | milvusdb/milvus:v2.4.6                            | "milvus run standalone"         | Up 6 minutes  | 0.0.0.0:19530->19530/tcp, 0.0.0.0:9091->9091/tcp | milvus-standalone             |
| 2l3m4n5o6p7q | opea/whisper:latest                               | "docker-entrypoint.sh"          | Up 6 minutes  | 0.0.0.0:7066->7066/tcp                            | whisper-service              |

```

### Validate the Pipeline

Once the MultimodalQnA services are running, test the pipeline using the following command:

```bash
DATA='{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}]}]}'

curl http://${HOST_IP}:8888/v1/multimodalqna \
  -H "Content-Type: application/json" \
  -d "$DATA"
```

### Cleanup the Deployment

To stop the containers associated with the deployment, execute the following command:

```bash
docker compose -f compose.yaml down
# if used milvus
# docker compose -f compose_milvus.yaml down
```

## MultimodalQnA Docker Compose Files

| File                                         | Description                                               |
| -------------------------------------------- | --------------------------------------------------------- |
| [compose.yaml](./compose.yaml)               | Default pipeline using Redis as vector store.             |
| [compose_milvus.yaml](./compose_milvus.yaml) | Variant using Milvus as vector database instead of Redis. |

## Validate Microservices

1. embedding-multimodal-bridgetower

   ```bash
   curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \
       -X POST \
       -H "Content-Type:application/json" \
       -d '{"text":"This is example"}'
   ```

   ```bash
   curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \
       -X POST \
       -H "Content-Type:application/json" \
       -d '{"text":"This is example", "img_b64_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC"}'
   ```

2. embedding

   ```bash
   curl http://${host_ip}:$MM_EMBEDDING_PORT_MICROSERVICE/v1/embeddings \
       -X POST \
       -H "Content-Type: application/json" \
       -d '{"text" : "This is some sample text."}'
   ```

   ```bash
   curl http://${host_ip}:$MM_EMBEDDING_PORT_MICROSERVICE/v1/embeddings \
       -X POST \
       -H "Content-Type: application/json" \
       -d '{"text": {"text" : "This is some sample text."}, "image" : {"url": "https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true"}}'
   ```

3. retriever-multimodal-redis

   ```bash
   export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(512)]; print(embedding)")
   curl http://${host_ip}:${REDIS_RETRIEVER_PORT}/v1/retrieval \
       -X POST \
       -H "Content-Type: application/json" \
       -d "{\"text\":\"test\",\"embedding\":${your_embedding}}"
   ```

4. whisper

   ```bash
   curl ${WHISPER_SERVER_ENDPOINT} \
       -X POST \
       -H "Content-Type: application/json" \
       -d '{"audio" : "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}'
   ```

5. TGI LLaVA Xeon Server

   ```bash
   curl http://${host_ip}:${LLAVA_SERVER_PORT}/generate \
       -X POST \
       -d '{"inputs":"![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png)What is this a picture of?\n\n","parameters":{"max_new_tokens":16, "seed": 42}}' \
       -H 'Content-Type: application/json'
   ```

6. tts

   ```bash
   curl ${TTS_ENDPOINT} \
   -X POST \
   -d '{"text": "Who are you?"}' \
   -H 'Content-Type: application/json'
   ```

7. lvm

   ```bash
   curl http://${host_ip}:${LVM_PORT}/v1/lvm \
       -X POST \
       -H 'Content-Type: application/json' \
       -d '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [{"b64_img_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC", "transcript_for_inference": "yellow image", "video_id": "8c7461df-b373-4a00-8696-9a2234359fe0", "time_of_frame_ms":"37000000", "source_video":"WeAreGoingOnBullrun_8c7461df-b373-4a00-8696-9a2234359fe0.mp4"}], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}'
   ```

   ```bash
   curl http://${host_ip}:${LVM_PORT}/v1/lvm  \
       -X POST \
       -H 'Content-Type: application/json' \
       -d '{"image": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC", "prompt":"What is this?"}'
   ```

   Also, validate LVM TGI Xeon Server with empty retrieval results

   ```bash
   curl http://${host_ip}:${LVM_PORT}/v1/lvm \
       -X POST \
       -H 'Content-Type: application/json' \
       -d '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}'
   ```

8. Multimodal Dataprep Microservice

   Download a sample video (.mp4), image (.png, .gif, .jpg), pdf, and audio file (.wav, .mp3) and create a caption

   ```bash
   export video_fn="WeAreGoingOnBullrun.mp4"
   wget http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/WeAreGoingOnBullrun.mp4 -O ${video_fn}

   export image_fn="apple.png"
   wget https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true -O ${image_fn}

   export pdf_fn="nke-10k-2023.pdf"
   wget https://raw.githubusercontent.com/opea-project/GenAIComps/v1.3/comps/third_parties/pathway/src/data/nke-10k-2023.pdf -O ${pdf_fn}

   export caption_fn="apple.txt"
   echo "This is an apple."  > ${caption_fn}

   export audio_fn="AudioSample.wav"
   wget https://github.com/intel/intel-extension-for-transformers/raw/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav -O ${audio_fn}
   ```

   Test dataprep microservice with generating transcript. This command updates a knowledge base by uploading a local video .mp4 and an audio .wav or .mp3 file.

   ```bash
   curl --silent --write-out "HTTPSTATUS:%{http_code}" \
       ${DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT} \
       -H 'Content-Type: multipart/form-data' \
       -X POST \
       -F "files=@./${video_fn}" \
       -F "files=@./${audio_fn}"
   ```

   Also, test dataprep microservice with generating an image caption using lvm

   ```bash
   curl --silent --write-out "HTTPSTATUS:%{http_code}" \
       ${DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT} \
       -H 'Content-Type: multipart/form-data' \
       -X POST -F "files=@./${image_fn}"
   ```

   Now, test the microservice with posting a custom caption along with an image and a PDF containing images and text. The image caption can be provided as a text (`.txt`) or as spoken audio (`.wav` or `.mp3`).

   ```bash
   curl --silent --write-out "HTTPSTATUS:%{http_code}" \
       ${DATAPREP_INGEST_SERVICE_ENDPOINT} \
       -H 'Content-Type: multipart/form-data' \
       -X POST -F "files=@./${image_fn}" -F "files=@./${caption_fn}" \
       -F "files=@./${pdf_fn}"
   ```

   Also, you are able to get the list of all files that you uploaded:

   ```bash
   curl -X POST \
       -H "Content-Type: application/json" \
       -d '{"file_path": "all"}' \
       ${DATAPREP_GET_FILE_ENDPOINT}
   ```

   Then you will get the response python-style LIST like this. Notice the name of each uploaded file e.g., `videoname.mp4` will become `videoname_uuid.mp4` where `uuid` is a unique ID for each uploaded file. The same files that are uploaded twice will have different `uuid`.

   ```bash
   [
       "WeAreGoingOnBullrun_7ac553a1-116c-40a2-9fc5-deccbb89b507.mp4",
       "WeAreGoingOnBullrun_6d13cf26-8ba2-4026-a3a9-ab2e5eb73a29.mp4",
       "apple_fcade6e6-11a5-44a2-833a-3e534cbe4419.png",
       "nke-10k-2023_28000757-5533-4b1b-89fe-7c0a1b7e2cd0.pdf",
       "AudioSample_976a85a6-dc3e-43ab-966c-9d81beef780c.wav"
   ]
   ```

   To delete all uploaded files along with data indexed with `$INDEX_NAME` in REDIS.

   ```bash
   curl -X POST \
       -H "Content-Type: application/json" \
       -d '{"file_path": "all"}' \
       ${DATAPREP_DELETE_FILE_ENDPOINT}
   ```

9. MegaService

   Test the MegaService with a text query:

   ```bash
   curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
       -H "Content-Type: application/json" \
       -X POST \
       -d '{"messages": "What is the revenue of Nike in 2023?"}'
   ```

   Test the MegaService with an audio query:

   ```bash
   curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna  \
       -H "Content-Type: application/json"  \
       -d '{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}]}]}'
   ```

   Test the MegaService with a text and image query:

   ```bash
   curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
       -H "Content-Type: application/json" \
       -d  '{"messages": [{"role": "user", "content": [{"type": "text", "text": "Green bananas in a tree"}, {"type": "image_url", "image_url": {"url": "http://images.cocodataset.org/test-stuff2017/000000004248.jpg"}}]}]}'
   ```

   Test the MegaService with a back and forth conversation between the user and assistant including a text to speech response from the assistant using `"modalities": ["text", "audio"]'`:

   ```bash
   curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
       -H "Content-Type: application/json" \
       -d '{"messages": [{"role": "user", "content": [{"type": "text", "text": "hello, "}, {"type": "image_url", "image_url": {"url": "https://www.ilankelman.org/stopsigns/australia.jpg"}}]}, {"role": "assistant", "content": "opea project! "}, {"role": "user", "content": "chao, "}], "max_tokens": 10, "modalities": ["text", "audio"]}'
   ```

## Conclusion

This guide enables developers to deploy MultimodalQnA on Intel Xeon processors with minimal setup. Configuration is handled via a single environment script, while modular Docker Compose files provide flexible deployment options across different vector store backends (Redis or Milvus). After deployment, validation can be performed both through direct API calls and the provided user interface.