# LVM Microservice Visual Question and Answering is one of the multimodal tasks empowered by LVMs (Large Visual Models). This microservice supports visual Q&A by using LLaVA as the base large visual model. It accepts two inputs: a prompt and an image. It outputs the answer to the prompt about the image. --- ## Table of Contents 1. [Overview](#overview) 2. [Key Features](#key-features) 3. [Supported Implementations](#supported-implementations) --- ## Overview Users can configure and deploy LVM-related services based on their specific requirements. This microservice supports a variety of backend implementations, each tailored to different performance, hardware, and model needs, allowing for flexible integration into diverse GenAI workflows. --- ## Key Features - **Multimodal Interaction** Natively supports question and answering with various visual inputs, including images and videos. - **Flexible Backends** Integrates with multiple state-of-the-art LVM implementations like LLaVA, LLaMA-Vision, Video-LLaMA, and more. - **Scalable Deployment** Ready for deployment using Docker, Docker Compose, and Kubernetes, ensuring scalability from local development to production environments. - **Standardized API** Provides a consistent and simple API endpoint, abstracting the complexities of the different underlying models. --- ## Supported Implementations The LVM Microservice supports multiple implementation options. Select the one that best fits your use case and follow the linked documentation for detailed setup instructions. | Implementation | Description | Documentation | | ------------------------ | -------------------------------------------------------------------- | ------------------------------------------------------- | | **With LLaVA** | A general-purpose VQA service using the LLaVA model. | [README_llava](src/README_llava.md) | | **With TGI LLaVA** | LLaVA service accelerated by TGI, optimized for Intel Gaudi HPUs. | [README_llava_tgi](src/README_llava_tgi.md) | | **With LLaMA-Vision** | VQA service leveraging the LLaMA-Vision model. | [README_llama_vision](src/README_llama_vision.md) | | **With Video-LLaMA** | A specialized service for performing VQA on video inputs. | [README_video_llama](src/README_video_llama.md) | | **With vLLM** | High-throughput LVM serving accelerated by vLLM on Intel Gaudi HPUs. | [README_vllm](src/README_vllm.md) | | **With PredictionGuard** | LVM service using Prediction Guard with built-in safety features. | [README_predictionguard](src/README_predictionguard.md) |