Deploying FAQ Generation on Intel® Xeon® Processors¶

In today’s data-driven world, organizations across various industries face the challenge of managing and understanding vast amounts of information. Legal documents, contracts, regulations, and customer inquiries often contain critical insights buried within dense text. Extracting and presenting these insights in a concise and accessible format is crucial for decision-making, compliance, and customer satisfaction.

Our FAQ Generation Application leverages the power of large language models (LLMs) to revolutionize the way you interact with and comprehend complex textual data. By harnessing cutting-edge natural language processing techniques, our application can automatically generate comprehensive and natural-sounding frequently asked questions (FAQs) from your documents, legal texts, customer queries, and other sources. In this example use case, we utilize LangChain to implement FAQ Generation and facilitate LLM inference using Text Generation Inference on Intel Xeon and Gaudi2 processors.

The FaqGen example is implemented using the component-level microservices defined in GenAIComps. The flow chart below shows the information flow between different microservices for this example.

flowchart LR %% Colors %% classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5 classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5 classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5 classDef invisible fill:transparent,stroke:transparent; style FaqGen-MegaService stroke:#000000 %% Subgraphs %% subgraph FaqGen-MegaService["FaqGen MegaService "] direction LR LLM([LLM MicroService]):::blue end subgraph UserInterface[" User Interface "] direction LR a([User Input Query]):::orchid UI([UI server<br>]):::orchid end LLM_gen{{LLM Service <br>}} GW([FaqGen GateWay<br>]):::orange %% Questions interaction direction LR a[User Input Query] --> UI UI --> GW GW <==> FaqGen-MegaService %% Embedding service flow direction LR LLM <-.-> LLM_gen

Build Docker Images¶

First of all, you need to build Docker Images locally. This step can be ignored once the Docker images are published to Docker hub.

1. Build vLLM Image¶

git clone https://github.com/vllm-project/vllm.git
cd ./vllm/
VLLM_VER="$(git describe --tags "$(git rev-list --tags --max-count=1)" )"
git checkout ${VLLM_VER}
docker build --no-cache --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile.cpu -t opea/vllm:latest --shm-size=128g .

2. Build LLM Image¶

git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
docker build -t opea/llm-faqgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/faq-generation/Dockerfile .

3. Build MegaService Docker Image¶

To construct the Mega Service, we utilize the GenAIComps microservice pipeline within the faqgen.py Python script. Build the MegaService Docker image via below command:

git clone https://github.com/opea-project/GenAIExamples
cd GenAIExamples/ChatQnA
docker build --no-cache -t opea/chatqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .

4. Build UI Docker Image¶

Build frontend Docker image via below command:

cd GenAIExamples/ChatQnA/ui
docker build --no-cache -t opea/chatqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .

5. Build Conversational React UI Docker Image (Optional)¶

Build frontend Docker image that enables Conversational experience with ChatQnA megaservice via below command:

Export the value of the public IP address of your Xeon server to the host_ip environment variable

cd GenAIExamples/ChatQnA/ui
docker build --no-cache -t opea/chatqna-conversation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .

6. Build Nginx Docker Image¶

cd GenAIComps
docker build -t opea/nginx:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/nginx/src/Dockerfile .

Then run the command docker images, you will have the following Docker Images:

opea/vllm:latest
opea/llm-faqgen:latest
opea/chatqna:latest
opea/chatqna-ui:latest
opea/nginx:latest

Start Microservices and MegaService¶

Required Models¶

We set default model as “meta-llama/Meta-Llama-3-8B-Instruct”, change “LLM_MODEL_ID” in following Environment Variables setting if you want to use other models.

If use gated models, you also need to provide huggingface token to “HF_TOKEN” environment variable.

Setup Environment Variables¶

Since the compose.yaml will consume some environment variables, you need to setup them in advance as below.

export no_proxy=${your_no_proxy}
export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}
export host_ip=${your_host_ip}
export LLM_ENDPOINT_PORT=8008
export LLM_SERVICE_PORT=9000
export FAQGEN_BACKEND_PORT=8888
export FAQGen_COMPONENT_NAME="OpeaFaqGenvLLM"
export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
export HF_TOKEN=${your_hf_api_token}
export MEGA_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_HOST_IP=${host_ip}
export LLM_ENDPOINT="http://${host_ip}:${LLM_ENDPOINT_PORT}"
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/faqgen"

Note: Please replace with your_host_ip with your external IP address, do not use localhost.

Start Microservice Docker Containers¶

cd GenAIExamples/FaqGen/docker_compose/intel/cpu/xeon
docker compose up -d

Validate Microservices¶

LLM Microservice

curl http://${host_ip}:${LLM_SERVICE_PORT}/v1/faqgen \
  -X POST \
  -d '{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}' \
  -H 'Content-Type: application/json'

MegaService

curl http://${host_ip}:8888/v1/chatqna \
    -X POST \
    -H 'Content-Type: multipart/form-data' \
    -d '{
	    "messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.",
	    "max_tokens": 32,
	    "stream": false
	}'

## enable stream
curl http://${host_ip}:8888/v1/chatqna \
    -X POST \
    -H 'Content-Type: multipart/form-data' \
    -d '{
	    "messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.",
	    "max_tokens": 32,
	    "stream": true
	}'

Following the validation of all aforementioned microservices, we are now prepared to construct a mega-service.

Launch the UI¶

To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the compose.yaml file as shown below:

  chaqna-gaudi-ui-server:
    image: opea/chatqna-ui:latest
    ...
    ports:
      - "80:5173"

Launch the Conversational UI (Optional)¶

To access the Conversational UI frontend, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the compose.yaml file as shown below:

  chaqna-xeon-conversation-ui-server:
    image: opea/chatqna-conversation-ui:latest
    ...
    ports:
      - "80:80"