Single node on-prem deployment on Intel® Xeon® Scalable processor¶

This section covers the single-node on-prem deployment of the DocSum example. It will show how to build a document summarization service using the Intel/neural-chat-7b-v3-3 model deployed on Intel® Xeon® Scalable processors. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the Getting Started section.

Overview¶

The OPEA GenAIComps microservices used to deploy a single node vLLM or TGI megaservice solution for DocSum are listed below:

ASR
LLM with vLLM or TGI

This solution is designed to demonstrate the use of the Intel/neural-chat-7b-v3-3 model on the Intel® Xeon® Scalable processors to take a document (.txt,.doc,.pdf), audio, or video file as the input and generate a summary. The steps will involve setting up Docker containers, uploading documents, and generating summaries. Although multiple versionf of the UI can be deployed, this tutorial will focus solely on the Gradio UI because it can handle multimedia docuemnts, .doc, and .pdf files.

Prerequisites¶

To run the UI on a web browser external to the host machine such as a laptop, the following port(s) need to be port forwarded when using SSH to log in to the host machine:

8888: DocSum megaservice port

This port is used for BACKEND_SERVICE_ENDPOINT defined in the set_env.sh for this example inside the docker compose folder. Specifically, for DocSum, append the following to the ssh command:

-L 8888:localhost:8888

Set up a workspace and clone the GenAIExamples GitHub repo.

export WORKSPACE=<Path>
cd $WORKSPACE
git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples

Optional It is recommended to use a stable release version by setting RELEASE_VERSION to a number only (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used.

export RELEASE_VERSION=<Release_Version> # Set desired release version - number only
cd GenAIExamples
git checkout tags/v${RELEASE_VERSION}
cd ..

Set up a HuggingFace account and generate a user access token. The Intel/neural-chat-7b-v3-3 model does not need special access, but the token can be used with other models requiring access.

Set the HUGGINGFACEHUB_API_TOKEN environment variable to the value of the Hugging Face token by executing the following command:

export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"

Set the host_ip environment variable to deploy the microservices on the endpoints enabled with ports:

export host_ip=$(hostname -I | awk '{print $1}')

Use Case Setup¶

DocSum will utilize the following GenAIComps services and associated tools. The tools and models listed in the table can be configured via environment variables in either the set_env.sh script or the compose.yaml file.

Use Case Components	Tools	Model	Service Type
LLM	vLLM or TGI	Intel/neural-chat-7b-v3-3	OPEA Microservice
ASR	Whisper	openai/whisper-small	OPEA Microservice
UI		NA	Gateway Service

Set the necessary environment variables to set up the use case. To swap out models, modify set_env.sh before running it. For example, the environment variable LLM_MODEL_ID can be changed to another model by specifying the HuggingFace model card ID.

To run the UI on a web browser on a laptop, modify BACKEND_SERVICE_ENDPOINT to use localhost or 127.0.0.1 instead of host_ip inside set_env.sh for the backend to properly receive data from the UI.

Run the set_env.sh script.

cd $WORKSPACE/GenAIExamples/DocSum/docker_compose
source ./set_env.sh

Deploy the Use Case¶

Navigate to the docker compose directory for this hardware platform.

cd $WORKSPACE/GenAIExamples/DocSum/docker_compose/intel/cpu/xeon

Run docker compose with the provided YAML file to start all the services mentioned above as containers. The vLLM or TGI service can be used for DocSum.

vllm

docker compose -f compose.yaml up -d

TGI

docker compose -f compose_tgi.yaml up -d

Check Env Variables¶

After running docker compose, check for warning messages for environment variables that are NOT set. Address them if needed.

WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.

Check if all the containers launched via docker compose are running i.e. each container’s STATUS is Up and Healthy.

Run this command to see this info:

docker ps -a

Sample output:

CONTAINER ID   IMAGE                          COMMAND                  CREATED         STATUS                   PORTS                                         NAMES
d02da5001212   opea/docsum-gradio-ui:latest   "python docsum_ui_gr…"   2 minutes ago   Up 19 seconds            0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp   docsum-xeon-ui-server
43de0d8ee9dd   opea/docsum:latest             "python docsum.py"       2 minutes ago   Up 19 seconds            0.0.0.0:8888->8888/tcp, [::]:8888->8888/tcp   docsum-xeon-backend-server
81f0e8d27f1f   opea/llm-docsum:latest         "bash entrypoint.sh"     2 minutes ago   Up 20 seconds            0.0.0.0:9000->9000/tcp, [::]:9000->9000/tcp   docsum-xeon-llm-server
a4a9501fc4df   opea/whisper:latest            "python whisper_serv…"   3 minutes ago   Up 2 minutes             0.0.0.0:7066->7066/tcp, [::]:7066->7066/tcp   docsum-xeon-whisper-server
951abf0ebb5a   opea/vllm:latest               "python3 -m vllm.ent…"   3 minutes ago   Up 2 minutes (healthy)   0.0.0.0:8008->80/tcp, [::]:8008->80/tcp       docsum-xeon-vllm-service

Each docker container’s log can also be checked using:

docker logs <CONTAINER_ID OR CONTAINER_NAME>

Launch UI¶

The Gradio UI is recommended because it can work with multimedia documents, .doc, and .pdf files.

Gradio UI¶

To access the frontend, open the following URL in a web browser: http://${host_ip}:5173. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend by modifying the FRONTEND_SERVICE_PORT environment variable. For reference, the port mapping in the compose.yaml file is shown below:

  docsum-gradio-ui:
    image: ${REGISTRY:-opea}/docsum-gradio-ui:${TAG:-latest}
    ...
    ports:
    - "${FRONTEND_SERVICE_PORT:-5173}:5173"

After making this change, rebuild and restart the containers for the change to take effect.

Stop the Services¶

Navigate to the docker compose directory for this hardware platform.

cd $WORKSPACE/GenAIExamples/DocSum/docker_compose/intel/cpu/xeon

To stop and remove all the containers, use the command below:

vllm

docker compose -f compose.yaml down

TGI

docker compose -f compose_tgi.yaml down

Single node on-prem deployment on Intel® Xeon® Scalable processor¶

Overview¶

Prerequisites¶

Use Case Setup¶

Deploy the Use Case¶

Check Env Variables¶

Validate Microservices¶

vLLM or TGI Service¶

LLM Microservice¶

Whisper Microservice¶

DocSum Megaservice¶

Megaservice with Long Context¶

Launch UI¶

Gradio UI¶

Stop the Services¶