Deploying MultimodalQnA on Intel® Xeon® Processors¶

This document outlines the deployment process for a MultimodalQnA application utilizing the GenAIComps microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as multimodal_embedding that employs BridgeTower model as embedding model, multimodal_retriever, lvm, and multimodal-data-prep.

MultimodalQnA Quick Start Deployment¶

This section describes how to quickly deploy and test the MultimodalQnA service manually on an Intel® Xeon® processor. The basic steps are:

Access the Code
Configure the Deployment Environment
Deploy the Services Using Docker Compose
Check the Deployment Status
Validate the Pipeline
Cleanup the Deployment

Access the Code¶

Clone the GenAIExamples repository and access the MultimodalQnA Docker Compose files and supporting scripts:

git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/MultimodalQnA

Then checkout a released version, such as v1.3:

git checkout v1.3

Configure the Deployment Environment¶

Before configuring environment variables, ensure you have a suitable Intel Xeon server instance ready for deployment. For example, if you are deploying on AWS, create an AWS account and launch an EC2 instance with an Intel Xeon processor. Recommended instance types include M7i or M7i-flex, which are optimized for 4th Gen Intel Xeon Scalable processors.

Refer to AWS M7i instance documentation for more details.

Make sure to open the following ports in your EC2 security group so the microservices can communicate properly:

Service	Ports (open to 0.0.0.0/0)
redis-vector-db	6379, 8001
embedding-multimodal-bridgetower	6006
embedding	6000
retriever-multimodal-redis	7000
lvm-llava	8399
lvm	9399
whisper	7066
speecht5-service	7055
dataprep-multimodal-redis	6007
multimodalqna	8888
multimodalqna-ui	5173

After the server setup and network configuration, proceed with setting environment variables specific to the deployment environment and source the set_env.sh script in this directory:

export host_ip="External_Public_IP"           # ip address of the node
export HF_TOKEN="Your_HuggingFace_API_Token"
export http_proxy="Your_HTTP_Proxy"           # http proxy if any
export https_proxy="Your_HTTPs_Proxy"         # https proxy if any
export no_proxy=localhost,127.0.0.1,$host_ip  # additional no proxies if needed
cd docker_compose/intel
source set_env.sh

# For Xeon, update the model environment variable
export LVM_MODEL_ID="llava-hf/llava-1.5-7b-hf"

Consult the section on MultimodalQnA Service configuration for information on how service specific configuration parameters affect deployments.

Deploy the Services Using Docker Compose¶

To deploy the MultimodalQnA services, execute the docker compose up command with the appropriate arguments. For a default deployment, execute the command below. It uses the ‘compose.yaml’ file.

cd cpu/xeon
docker compose -f compose.yaml up -d

Alternatively, to use Milvus vector database instead of Redis:

export MILVUS_HOST=${host_ip}
export MILVUS_PORT=19530
export MILVUS_RETRIEVER_PORT=7000
export COLLECTION_NAME=LangChainCollection

docker compose -f compose_milvus.yaml up -d

Check the Deployment Status¶

After running docker compose, check if all the containers launched via docker compose have started:

docker ps -a

For the default deployment, the following 11 containers should have started:

| CONTAINER ID | IMAGE                                                   | COMMAND                  | STATUS       | PORTS                                         | NAMES                             |
|--------------|---------------------------------------------------------|--------------------------|--------------|-----------------------------------------------|----------------------------------|
| c1d2e3f4g5h6 | opea/multimodalqna-ui:latest                            | "docker-entrypoint.sh"   | Up 5 minutes | 0.0.0.0:5173->5173/tcp                        | multimodalqna-gradio-ui-server   |
| a1b2c3d4e5f6 | opea/multimodalqna:latest                               | "docker-entrypoint.sh"   | Up 5 minutes | 0.0.0.0:8888->8888/tcp                        | multimodalqna-backend-server     |
| b2c3d4e5f6g7 | opea/lvm:latest                                         | "docker-entrypoint.sh"   | Up 5 minutes | 0.0.0.0:9399->9399/tcp                        | lvm                              |
| d3e4f5g6h7i8 | opea/lvm-llava:latest                                   | "python llava_server.py" | Up 5 minutes | 0.0.0.0:8080->8080/tcp                        | lvm-llava                       |
| e4f5g6h7i8j9 | opea/retriever:latest                                   | "docker-entrypoint.sh"   | Up 5 minutes | 0.0.0.0:7000->7000/tcp                        | retriever-redis                  |
| f5g6h7i8j9k0 | opea/embedding:latest                                   | "docker-entrypoint.sh"   | Up 5 minutes | 0.0.0.0:7061->7061/tcp                        | embedding                       |
| g6h7i8j9k0l1 | opea/embedding-multimodal-bridgetower:latest           | "python bridgetower..."  | Up 5 minutes | 0.0.0.0:7050->7050/tcp                        | embedding-multimodal-bridgetower |
| h7i8j9k0l1m2 | opea/dataprep:latest                                    | "docker-entrypoint.sh"   | Up 5 minutes | 0.0.0.0:6007->5000/tcp                        | dataprep-multimodal-redis       |
| i8j9k0l1m2n3 | redis/redis-stack:7.2.0-v9                              | "redis-stack-server"     | Up 5 minutes | 0.0.0.0:6379->6379/tcp, 8001->8001/tcp        | redis-vector-db                 |
| j9k0l1m2n3o4 | opea/speecht5:latest                                    | "docker-entrypoint.sh"   | Up 5 minutes | 0.0.0.0:7055->7055/tcp                        | speecht5-service                |
| k0l1m2n3o4p5 | opea/whisper:latest                                     | "docker-entrypoint.sh"   | Up 5 minutes | 0.0.0.0:7066->7066/tcp                        | whisper-service                |

For the Milvus deployment, the following 12 containers should have started:

| CONTAINER ID | IMAGE                                             | COMMAND                        | STATUS        | PORTS                                             | NAMES                          |
|--------------|---------------------------------------------------|--------------------------------|---------------|---------------------------------------------------|--------------------------------|
| 1a2b3c4d5e6f | opea/multimodalqna-ui:latest                      | "docker-entrypoint.sh"          | Up 6 minutes  | 0.0.0.0:5173->5173/tcp                            | multimodalqna-gradio-ui-server |
| 2b3c4d5e6f7g | opea/multimodalqna:latest                         | "docker-entrypoint.sh"          | Up 6 minutes  | 0.0.0.0:8888->8888/tcp                            | multimodalqna-backend-server   |
| 3c4d5e6f7g8h | opea/lvm:latest                                   | "docker-entrypoint.sh"          | Up 6 minutes  | 0.0.0.0:9399->9399/tcp                            | lvm                           |
| 4d5e6f7g8h9i | opea/lvm-llava:latest                             | "python llava_server.py"        | Up 6 minutes  | 0.0.0.0:8080->8080/tcp                            | lvm-llava                     |
| 5e6f7g8h9i0j | opea/retriever:latest                             | "docker-entrypoint.sh"          | Up 6 minutes  | 0.0.0.0:7000->7000/tcp                            | retriever-milvus              |
| 6f7g8h9i0j1k | opea/embedding:latest                             | "docker-entrypoint.sh"          | Up 6 minutes  | 0.0.0.0:7061->7061/tcp                            | embedding                    |
| 7g8h9i0j1k2l | opea/embedding-multimodal-bridgetower:latest     | "python bridgetower_server.py"  | Up 6 minutes  | 0.0.0.0:7050->7050/tcp                            | embedding-multimodal-bridgetower |
| 8h9i0j1k2l3m | opea/dataprep:latest                              | "docker-entrypoint.sh"          | Up 6 minutes  | 0.0.0.0:6007->5000/tcp                            | dataprep-multimodal-milvus    |
| 9i0j1k2l3m4n | quay.io/coreos/etcd:v3.5.5                        | "etcd ..."                     | Up 6 minutes  | 2379/tcp                                          | milvus-etcd                   |
| 0j1k2l3m4n5o | minio/minio:RELEASE.2023-03-20T20-16-18Z          | "minio server ..."             | Up 6 minutes  | 0.0.0.0:5044->9001/tcp, 0.0.0.0:5043->9000/tcp   | milvus-minio                  |
| 1k2l3m4n5o6p | milvusdb/milvus:v2.4.6                            | "milvus run standalone"         | Up 6 minutes  | 0.0.0.0:19530->19530/tcp, 0.0.0.0:9091->9091/tcp | milvus-standalone             |
| 2l3m4n5o6p7q | opea/whisper:latest                               | "docker-entrypoint.sh"          | Up 6 minutes  | 0.0.0.0:7066->7066/tcp                            | whisper-service              |

Validate the Pipeline¶

Once the MultimodalQnA services are running, test the pipeline using the following command:

DATA='{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}]}]}'

curl http://${HOST_IP}:8888/v1/multimodalqna \
  -H "Content-Type: application/json" \
  -d "$DATA"

Cleanup the Deployment¶

To stop the containers associated with the deployment, execute the following command:

docker compose -f compose.yaml down
# if used milvus
# docker compose -f compose_milvus.yaml down

MultimodalQnA Docker Compose Files¶

File	Description
compose.yaml	Default pipeline using Redis as vector store.
compose_milvus.yaml	Variant using Milvus as vector database instead of Redis.

Validate Microservices¶

embedding-multimodal-bridgetower

curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \
    -X POST \
    -H "Content-Type:application/json" \
    -d '{"text":"This is example"}'

curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \
    -X POST \
    -H "Content-Type:application/json" \
    -d '{"text":"This is example", "img_b64_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC"}'

embedding

curl http://${host_ip}:$MM_EMBEDDING_PORT_MICROSERVICE/v1/embeddings \
    -X POST \
    -H "Content-Type: application/json" \
    -d '{"text" : "This is some sample text."}'

curl http://${host_ip}:$MM_EMBEDDING_PORT_MICROSERVICE/v1/embeddings \
    -X POST \
    -H "Content-Type: application/json" \
    -d '{"text": {"text" : "This is some sample text."}, "image" : {"url": "https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true"}}'

retriever-multimodal-redis

export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(512)]; print(embedding)")
curl http://${host_ip}:${REDIS_RETRIEVER_PORT}/v1/retrieval \
    -X POST \
    -H "Content-Type: application/json" \
    -d "{\"text\":\"test\",\"embedding\":${your_embedding}}"

whisper

curl ${WHISPER_SERVER_ENDPOINT} \
    -X POST \
    -H "Content-Type: application/json" \
    -d '{"audio" : "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}'

TGI LLaVA Xeon Server

curl http://${host_ip}:${LLAVA_SERVER_PORT}/generate \
    -X POST \
    -d '{"inputs":"![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png)What is this a picture of?\n\n","parameters":{"max_new_tokens":16, "seed": 42}}' \
    -H 'Content-Type: application/json'

tts

curl ${TTS_ENDPOINT} \
-X POST \
-d '{"text": "Who are you?"}' \
-H 'Content-Type: application/json'

lvm

curl http://${host_ip}:${LVM_PORT}/v1/lvm \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [{"b64_img_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC", "transcript_for_inference": "yellow image", "video_id": "8c7461df-b373-4a00-8696-9a2234359fe0", "time_of_frame_ms":"37000000", "source_video":"WeAreGoingOnBullrun_8c7461df-b373-4a00-8696-9a2234359fe0.mp4"}], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}'

curl http://${host_ip}:${LVM_PORT}/v1/lvm  \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{"image": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC", "prompt":"What is this?"}'

Also, validate LVM TGI Xeon Server with empty retrieval results

curl http://${host_ip}:${LVM_PORT}/v1/lvm \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}'

Multimodal Dataprep Microservice

Download a sample video (.mp4), image (.png, .gif, .jpg), pdf, and audio file (.wav, .mp3) and create a caption

export video_fn="WeAreGoingOnBullrun.mp4"
wget http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/WeAreGoingOnBullrun.mp4 -O ${video_fn}

export image_fn="apple.png"
wget https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true -O ${image_fn}

export pdf_fn="nke-10k-2023.pdf"
wget https://raw.githubusercontent.com/opea-project/GenAIComps/v1.3/comps/third_parties/pathway/src/data/nke-10k-2023.pdf -O ${pdf_fn}

export caption_fn="apple.txt"
echo "This is an apple."  > ${caption_fn}

export audio_fn="AudioSample.wav"
wget https://github.com/intel/intel-extension-for-transformers/raw/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav -O ${audio_fn}

Test dataprep microservice with generating transcript. This command updates a knowledge base by uploading a local video .mp4 and an audio .wav or .mp3 file.

curl --silent --write-out "HTTPSTATUS:%{http_code}" \
    ${DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT} \
    -H 'Content-Type: multipart/form-data' \
    -X POST \
    -F "files=@./${video_fn}" \
    -F "files=@./${audio_fn}"

Also, test dataprep microservice with generating an image caption using lvm

curl --silent --write-out "HTTPSTATUS:%{http_code}" \
    ${DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT} \
    -H 'Content-Type: multipart/form-data' \
    -X POST -F "files=@./${image_fn}"

Now, test the microservice with posting a custom caption along with an image and a PDF containing images and text. The image caption can be provided as a text (.txt) or as spoken audio (.wav or .mp3).

curl --silent --write-out "HTTPSTATUS:%{http_code}" \
    ${DATAPREP_INGEST_SERVICE_ENDPOINT} \
    -H 'Content-Type: multipart/form-data' \
    -X POST -F "files=@./${image_fn}" -F "files=@./${caption_fn}" \
    -F "files=@./${pdf_fn}"

Also, you are able to get the list of all files that you uploaded:

curl -X POST \
    -H "Content-Type: application/json" \
    -d '{"file_path": "all"}' \
    ${DATAPREP_GET_FILE_ENDPOINT}

Then you will get the response python-style LIST like this. Notice the name of each uploaded file e.g., videoname.mp4 will become videoname_uuid.mp4 where uuid is a unique ID for each uploaded file. The same files that are uploaded twice will have different uuid.

[
    "WeAreGoingOnBullrun_7ac553a1-116c-40a2-9fc5-deccbb89b507.mp4",
    "WeAreGoingOnBullrun_6d13cf26-8ba2-4026-a3a9-ab2e5eb73a29.mp4",
    "apple_fcade6e6-11a5-44a2-833a-3e534cbe4419.png",
    "nke-10k-2023_28000757-5533-4b1b-89fe-7c0a1b7e2cd0.pdf",
    "AudioSample_976a85a6-dc3e-43ab-966c-9d81beef780c.wav"
]

To delete all uploaded files along with data indexed with $INDEX_NAME in REDIS.

curl -X POST \
    -H "Content-Type: application/json" \
    -d '{"file_path": "all"}' \
    ${DATAPREP_DELETE_FILE_ENDPOINT}

MegaService

Test the MegaService with a text query:

curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
    -H "Content-Type: application/json" \
    -X POST \
    -d '{"messages": "What is the revenue of Nike in 2023?"}'

Test the MegaService with an audio query:

curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna  \
    -H "Content-Type: application/json"  \
    -d '{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}]}]}'

Test the MegaService with a text and image query:

curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
    -H "Content-Type: application/json" \
    -d  '{"messages": [{"role": "user", "content": [{"type": "text", "text": "Green bananas in a tree"}, {"type": "image_url", "image_url": {"url": "http://images.cocodataset.org/test-stuff2017/000000004248.jpg"}}]}]}'

Test the MegaService with a back and forth conversation between the user and assistant including a text to speech response from the assistant using "modalities": ["text", "audio"]':

curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
    -H "Content-Type: application/json" \
    -d '{"messages": [{"role": "user", "content": [{"type": "text", "text": "hello, "}, {"type": "image_url", "image_url": {"url": "https://www.ilankelman.org/stopsigns/australia.jpg"}}]}, {"role": "assistant", "content": "opea project! "}, {"role": "user", "content": "chao, "}], "max_tokens": 10, "modalities": ["text", "audio"]}'

Conclusion¶

This guide enables developers to deploy MultimodalQnA on Intel Xeon processors with minimal setup. Configuration is handled via a single environment script, while modular Docker Compose files provide flexible deployment options across different vector store backends (Redis or Milvus). After deployment, validation can be performed both through direct API calls and the provided user interface.