FAQGen LLM Microservice

This microservice interacts with the TGI/vLLM LLM server to generate FAQs(frequently asked questions and answers) from Input Text. You can set backend service either TGI or vLLM.

🚀1. Start Microservice with Docker

1.1 Setup Environment Variables

In order to start FaqGen microservices, you need to setup the following environment variables first.

export host_ip=${your_host_ip}
export LLM_ENDPOINT_PORT=8008
export FAQ_PORT=9000
export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
export LLM_ENDPOINT="http://${host_ip}:${LLM_ENDPOINT_PORT}"
export LLM_MODEL_ID=${your_hf_llm_model}
export FAQGen_COMPONENT_NAME="OpeaFaqGenTgi" # or "vllm"

1.2 Build Docker Image

Step 1: Prepare backend LLM docker image.

If you want to use vLLM backend, refer to vLLM to build vLLM docker images first.

No need for TGI.

Step 2: Build FaqGen docker image.

cd ../../../../
docker build -t opea/llm-faqgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/faq-generation/Dockerfile .

1.3 Run Docker

To start a docker container, you have two options:

  • A. Run Docker with CLI

  • B. Run Docker with Docker Compose

You can choose one as needed.

1.3.1 Run Docker with CLI (Option A)

Step 1: Start the backend LLM service Please refer to TGI or vLLM guideline to start a backend LLM service.

Step 2: Start the FaqGen microservices

docker run -d \
    --name="llm-faqgen-server" \
    -p 9000:9000 \
    --ipc=host \
    -e http_proxy=$http_proxy \
    -e https_proxy=$https_proxy \
    -e LLM_MODEL_ID=$LLM_MODEL_ID \
    -e LLM_ENDPOINT=$LLM_ENDPOINT \
    -e HUGGINGFACEHUB_API_TOKEN=$HUGGINGFACEHUB_API_TOKEN \
    -e FAQGen_COMPONENT_NAME=$FAQGen_COMPONENT_NAME \
    opea/llm-faqgen:latest

1.3.2 Run Docker with Docker Compose (Option B)

cd ../../deployment/docker_compose/

# Backend is TGI on xeon
docker compose -f faq-generation_tgi.yaml up -d

# Backend is TGI on gaudi
# docker compose -f faq-generation_tgi_on_intel_hpu.yaml up -d

# Backend is vLLM on xeon
# docker compose -f faq-generation_vllm.yaml up -d

# Backend is vLLM on gaudi
# docker compose -f faq-generation_vllm_on_intel_hpu.yaml up -d

🚀2. Consume LLM Service

2.1 Check Service Status

curl http://${host_ip}:${FAQ_PORT}/v1/health_check\
  -X GET \
  -H 'Content-Type: application/json'

2.2 Consume FAQGen LLM Service

# Streaming Response
# Set stream to True. Default will be True.
curl http://${host_ip}:${FAQ_PORT}/v1/faqgen \
  -X POST \
  -d '{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.","max_tokens": 128}' \
  -H 'Content-Type: application/json'

# Non-Streaming Response
# Set stream to False.
curl http://${host_ip}:${FAQ_PORT}/v1/faqgen \
  -X POST \
  -d '{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.","max_tokens": 128, "stream":false}' \
  -H 'Content-Type: application/json'