LVM Microservice with vLLM on Gaudi¶
This service provides high-throughput, low-latency LVM serving accelerated by vLLM, optimized for Intel Gaudi HPUs.
Table of Contents¶
Prerequisites¶
Build vLLM Gaudi Docker Image¶
You must build the custom vllm-gaudi
Docker image locally first.
git clone https://github.com/HabanaAI/vllm-fork.git
cd ./vllm-fork/
# Note: The commit hash is for a specific version. Check for updates.
git checkout f78aeb9da0712561163eddd353e3b6097cd69bac
docker build -f Dockerfile.hpu -t opea/vllm-gaudi:latest --shm-size=128g . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
cd ..
rm -rf vllm-fork
Start Microservice¶
Build LVM Docker Image¶
Build the generic LVM microservice Docker image:
cd ../../../
docker build -t opea/lvm:latest \
--build-arg https_proxy=$https_proxy \
--build-arg http_proxy=$http_proxy \
-f comps/lvms/src/Dockerfile .
Run with Docker Compose¶
Deploy the vLLM service and the LVM microservice using Docker Compose.
Export the required environment variables:
export ip_address=$(hostname -I | awk '{print $1}') export LVM_PORT=9399 export VLLM_PORT=11507 export LVM_ENDPOINT="http://$ip_address:$VLLM_PORT" # Option 1: for LLaVA model export LLM_MODEL_ID=llava-hf/llava-1.5-7b-hf export CHAT_TEMPLATE=examples/template_llava.jinja # Option 2: for UI-TARS model # export LLM_MODEL_ID=bytedance-research/UI-TARS-7B-DPO # export TP_SIZE=1 # change to 4 or 8 if using UI-TARS-72B-DPO # export CHAT_TEMPLATE=None # Skip warmup for faster server start on Gaudi (may increase initial inference time) export VLLM_SKIP_WARMUP=true
Navigate to the Docker Compose directory and start the services:
cd comps/lvms/deployment/docker_compose/ docker compose up vllm-gaudi-service lvm-vllm-gaudi -d
Consume LVM Service¶
Once the service is running, you can send requests to the API.
Use the LVM Service API¶
Send a POST request with an image (base64 encoded) and a prompt.
curl http://localhost:9399/v1/lvm \
-X POST \
-d '{"image": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC", "prompt":"What is this?"}' \
-H 'Content-Type: application/json'