LVM Microservice with vLLM on Gaudi¶

This service provides high-throughput, low-latency LVM serving accelerated by vLLM, optimized for Intel Gaudi HPUs.

Table of Contents¶

Prerequisites
Start Microservice
Consume LVM Service

Prerequisites¶

Build vLLM Gaudi Docker Image¶

You must build the custom vllm-gaudi Docker image locally first.

git clone https://github.com/HabanaAI/vllm-fork.git
cd ./vllm-fork/
# Note: The commit hash is for a specific version. Check for updates.
git checkout f78aeb9da0712561163eddd353e3b6097cd69bac
docker build -f Dockerfile.hpu -t opea/vllm-gaudi:latest --shm-size=128g . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
cd ..
rm -rf vllm-fork

Start Microservice¶

Build LVM Docker Image¶

Build the generic LVM microservice Docker image:

cd ../../../
docker build -t opea/lvm:latest \
  --build-arg https_proxy=$https_proxy \
  --build-arg http_proxy=$http_proxy \
  -f comps/lvms/src/Dockerfile .

Run with Docker Compose¶

Deploy the vLLM service and the LVM microservice using Docker Compose.

Export the required environment variables:

export ip_address=$(hostname -I | awk '{print $1}')
export LVM_PORT=9399
export VLLM_PORT=11507
export LVM_ENDPOINT="http://$ip_address:$VLLM_PORT"

# Option 1: for LLaVA model
export LLM_MODEL_ID=llava-hf/llava-1.5-7b-hf
export CHAT_TEMPLATE=examples/template_llava.jinja

# Option 2: for UI-TARS model
# export LLM_MODEL_ID=bytedance-research/UI-TARS-7B-DPO
# export TP_SIZE=1    # change to 4 or 8 if using UI-TARS-72B-DPO
# export CHAT_TEMPLATE=None

# Skip warmup for faster server start on Gaudi (may increase initial inference time)
export VLLM_SKIP_WARMUP=true

Navigate to the Docker Compose directory and start the services:

cd comps/lvms/deployment/docker_compose/
docker compose up vllm-gaudi-service lvm-vllm-gaudi -d

Consume LVM Service¶

Once the service is running, you can send requests to the API.

Use the LVM Service API¶

Send a POST request with an image (base64 encoded) and a prompt.

curl http://localhost:9399/v1/lvm \
  -X POST \
  -d '{"image": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC", "prompt":"What is this?"}' \
  -H 'Content-Type: application/json'