# LVM Microservice with vLLM on Gaudi This service provides high-throughput, low-latency LVM serving accelerated by vLLM, optimized for Intel Gaudi HPUs. --- ## Table of Contents 1. [Prerequisites](#prerequisites) 2. [Start Microservice](#start-microservice) 3. [Consume LVM Service](#consume-lvm-service) --- ## Prerequisites ### Build vLLM Gaudi Docker Image You must build the custom `vllm-gaudi` Docker image locally first. ```bash git clone https://github.com/HabanaAI/vllm-fork.git cd ./vllm-fork/ # Note: The commit hash is for a specific version. Check for updates. git checkout f78aeb9da0712561163eddd353e3b6097cd69bac docker build -f Dockerfile.hpu -t opea/vllm-gaudi:latest --shm-size=128g . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy cd .. rm -rf vllm-fork ``` ## Start Microservice ### Build LVM Docker Image Build the generic LVM microservice Docker image: ```bash cd ../../../ docker build -t opea/lvm:latest \ --build-arg https_proxy=$https_proxy \ --build-arg http_proxy=$http_proxy \ -f comps/lvms/src/Dockerfile . ``` ### Run with Docker Compose Deploy the vLLM service and the LVM microservice using Docker Compose. 1. Export the required environment variables: ```bash export ip_address=$(hostname -I | awk '{print $1}') export LVM_PORT=9399 export VLLM_PORT=11507 export LVM_ENDPOINT="http://$ip_address:$VLLM_PORT" # Option 1: for LLaVA model export LLM_MODEL_ID=llava-hf/llava-1.5-7b-hf export CHAT_TEMPLATE=examples/template_llava.jinja # Option 2: for UI-TARS model # export LLM_MODEL_ID=bytedance-research/UI-TARS-7B-DPO # export TP_SIZE=1 # change to 4 or 8 if using UI-TARS-72B-DPO # export CHAT_TEMPLATE=None # Skip warmup for faster server start on Gaudi (may increase initial inference time) export VLLM_SKIP_WARMUP=true ``` 2. Navigate to the Docker Compose directory and start the services: ```bash cd comps/lvms/deployment/docker_compose/ docker compose up vllm-gaudi-service lvm-vllm-gaudi -d ``` --- ## Consume LVM Service Once the service is running, you can send requests to the API. ### Use the LVM Service API Send a POST request with an image (base64 encoded) and a prompt. ```bash curl http://localhost:9399/v1/lvm \ -X POST \ -d '{"image": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC", "prompt":"What is this?"}' \ -H 'Content-Type: application/json' ```