LVM Microservice¶
Visual Question and Answering is one of the multimodal tasks empowered by LVMs (Large Visual Models). This microservice supports visual Q&A by using LLaVA as the base large visual model. It accepts two inputs: a prompt and images. It outputs the answer to the prompt about the images.
🚀1. Start Microservice with Python (Option 1)¶
1.1 Install Requirements¶
pip install -r requirements.txt
1.2 Start LLaVA Service/Test¶
Xeon CPU
# Start LLaVA service
nohup python llava_server.py --device=cpu &
# Wait until the server is up
# Test
python check_llava_server.py
Gaudi2 HPU
pip install optimum[habana]
# Start LLaVA service
nohup python llava_server.py &
# Test
python check_llava_server.py
🚀2. Start Microservice with Docker (Option 2)¶
2.1 Build Images¶
2.1.1 LLaVA Server Image¶
Xeon CPU
cd ../../../
docker build -t opea/lvm-llava:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/lvms/src/integrations/dependency/llava/Dockerfile .
Gaudi2 HPU
cd ../../../
docker build -t opea/lvm-llava:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/lvms/src/integrations/dependency/llava/Dockerfile.intel_hpu .
2.2 Start LLaVA and LVM Service¶
2.2.1 Start LLaVA server¶
Xeon
docker run -p 8399:8399 -e http_proxy=$http_proxy --ipc=host -e https_proxy=$https_proxy opea/lvm-llava:latest
Gaudi2 HPU
docker run -p 8399:8399 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy opea/lvm-llava:latest
2.2.2 Test¶
Note: The
MAX_IMAGES
environment variable is used to specify the maximum number of images that will be sent from the LVM service to the LLaVA server. If an image list longer thanMAX_IMAGES
is sent to the LVM server, a shortened image list will be sent to the LLaVA service. If the image list needs to be shortened, the most recent images (the ones at the end of the list) are prioritized to send to the LLaVA service. Some LLaVA models have not been trained with multiple images and may lead to inaccurate results. IfMAX_IMAGES
is not set, it will default to1
.
# Use curl/python
# curl with an image and a prompt
http_proxy="" curl http://localhost:9399/v1/lvm -XPOST -d '{"image": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC", "prompt":"What is this?"}' -H 'Content-Type: application/json'
# curl with multiple images and a prompt (Note that depending on your MAX_IMAGES value, both images may not be sent to the LLaVA model)
http_proxy="" curl http://localhost:9399/v1/lvm -XPOST -d '{"image": ["iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mNkYPhfz0AEYBxVSF+FAP5FDvcfRYWgAAAAAElFTkSuQmCC", "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mNk+M9Qz0AEYBxVSF+FAAhKDveksOjmAAAAAElFTkSuQmCC"], "prompt":"What is in these images?"}' -H 'Content-Type: application/json'
# curl with a prompt only (no image)
http_proxy="" curl http://localhost:9399/v1/lvm -XPOST -d '{"image": "", "prompt":"What is deep learning?"}' -H 'Content-Type: application/json'
# Test
python check_llava_server.py