# Visual Question and Answering Visual Question Answering (VQA) is the task of answering open-ended questions based on an image. The input to models supporting this task is typically a combination of an image and a question, and the output is an answer expressed in natural language. Some noteworthy use case examples for VQA include: - Accessibility applications for visually impaired individuals. - Education: posing questions about visual materials presented in lectures or textbooks. VQA can also be utilized in interactive museum exhibits or historical sites. - Customer service and e-commerce: VQA can enhance user experience by letting users ask questions about products. - Image retrieval: VQA models can be used to retrieve images with specific characteristics. For example, the user can ask “Is there a dog?” to find all images with dogs from a set of images. ## Table of Contents 1. [Architecture](#architecture) 2. [Deployment Options](#deployment-options) 3. [Validated Configurations](#validated-configurations) ## Architecture ![VQA](./assets/img/vqa.png) The VisualQnA example is implemented using the component-level microservices defined in [GenAIComps](https://github.com/opea-project/GenAIComps). The flow chart below shows the information flow between different microservices for this example. ```mermaid --- config: flowchart: nodeSpacing: 400 rankSpacing: 100 curve: linear themeVariables: fontSize: 50px --- flowchart LR %% Colors %% classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5 classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5 classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5 classDef invisible fill:transparent,stroke:transparent; style VisualQnA-MegaService stroke:#000000 %% Subgraphs %% subgraph VisualQnA-MegaService["VisualQnA MegaService "] direction LR LVM([LVM MicroService]):::blue end subgraph UserInterface[" User Interface "] direction LR a([User Input Query]):::orchid Ingest([Ingest data]):::orchid UI([UI server
]):::orchid end LVM_gen{{LVM Service
}} GW([VisualQnA GateWay
]):::orange NG([Nginx MicroService]):::blue %% Questions interaction direction LR Ingest[Ingest data] --> UI a[User Input Query] --> |Need Proxy Server|NG a[User Input Query] --> UI NG --> UI UI --> GW GW <==> VisualQnA-MegaService %% Embedding service flow direction LR LVM <-.-> LVM_gen ``` This example guides you through how to deploy a [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT) (Open Large Multimodal Models) model on [Intel Gaudi2](https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi-overview.html) and [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon.html). We invite contributions from other hardware vendors to expand the OPEA ecosystem. ![llava screenshot](./assets/img/llava_screenshot1.png) ![llava-screenshot](./assets/img/llava_screenshot2.png) ## Deployment Options The VisualQnA service can be effortlessly deployed on either Intel Gaudi2 or Intel Xeon Scalable Processors. The table below lists currently available deployment options. They outline in detail the implementation of this example on selected hardware. | Category | Deployment Option | Description | | ---------------------- | ----------------- | ----------------------------------------------------------------- | | On-premise Deployments | Docker compose | [VisualQnA deployment on Xeon](./docker_compose/intel/cpu/xeon) | | | | [VisualQnA deployment on Gaudi](./docker_compose/intel/hpu/gaudi) | | | | [VisualQnA deployment on AMD ROCm](./docker_compose/amd/gpu/rocm) | | | Kubernetes | [Helm Charts](./kubernetes/helm) | | | | [GMC](./kubernetes/gmc) | ## Validated Configurations | **Deploy Method** | **LLM Engine** | **LLM Model** | **Hardware** | | ----------------- | -------------- | --------------------------------- | ------------ | | Docker Compose | TGI, vLLM | llava-hf/llava-v1.6-mistral-7b-hf | Intel Xeon | | Docker Compose | TGI, vLLM | llava-hf/llava-1.5-7b-hf | Intel Gaudi | | Docker Compose | TGI, vLLM | Xkev/Llama-3.2V-11B-cot | AMD ROCm | | Helm Charts | TGI, vLLM | llava-hf/llava-v1.6-mistral-7b-hf | Intel Gaudi | | Helm Charts | TGI, vLLM | llava-hf/llava-v1.6-mistral-7b-hf | Intel Xeon |