OPEA Release Notes v1.1¶
We are pleased to announce the release of OPEA version 1.1, which includes significant contributions from the open-source community. This release addresses over 470 pull requests.
More information about how to get started with OPEA v1.1 can be found at Getting Started page. All project source code is maintained in the repository. To pull Docker images, please access the Docker Hub. For instructions on deploying Helm Charts, please refer to the guide.
What’s New in OPEA v1.1¶
This release introduces more scenarios with general availability, including:
Newly supported Generative AI capabilities: Image-to-Video, Text-to-Image, Text-to-SQL and Avatar Animation.
Generative AI Studio that offers a no-code alternative to create enterprise Generative AI applications.
Expands the portfolio of supported hardware to include Intel® Arc™ GPUs and AMD® GPUs.
Enhanced monitoring support, providing real-time insights into runtime status and system resource utilization for CPU and Intel® Gaudi® AI Accelerator, as well as Horizontal Pod Autoscaling (HPA).
Helm Chart support for 7 new GenAIExamples and their microservices.
Benchmark tools for long-context language models (LCLMs) such as LongBench and HELMET.
Highlights¶
New GenAI Examples¶
AvatarChatbot: a chatbot that combines a virtual “avatar” that can run on either Intel Gaudi 2 AI Accelerator or Intel Xeon Scalable Processors.
DBQnA: for seamless translation of natural language queries into SQL and deliver real-time database results.
EdgeCraftRAG: a customizable and tunable RAG example for edge solutions on Intel® Arc™ GPUs.
GraphRAG: a Graph RAG-based approach to summarization.
Text2Image: an application that generates images based on text prompts.
WorkflowExecAgent: a workflow executor example to handle data/AI workflow operations via LangChain agents to execute custom-defined workflow-based tools.
Enhanced GenAI Examples¶
Multi-media support: DocSum, MultimodalQnA
New GenAI Components¶
Text-to-Image: add Stable Diffusion microservice
Image-to-Video: add Stable Video Diffusion microservice
Text-to-SQL: add Text-to-SQL microservice
Text-to-Speech: add GPT-SoVITS microservice
Avatar Animation: add Animation microservice
Enhanced GenAI Components¶
Asynchronous support for microservices (28672956, 9df4b3c0, f3746dc8)
Add vLLM backends for summarization, FAQ generation, code generation, and Agents
GenAIStudio¶
GenAI Studio, a new project of OPEA, streamlines the creation of enterprise Generative AI applications by providing an alternative UI-based processes to create end-to-end solutions. It supports GenAI application definition, evaluation, performance benchmarking, and deployment. The GenAI Studio empowers developers to effortlessly build, test, optimize their LLM solutions, and create a deployment package. Its intuitive no-code/low-code interface accelerates innovation, enabling rapid development and deployment of cutting-edge AI applications with unparalleled efficiency and precision.
Enhanced Observability¶
Observability offers real-time insights into component performance and system resource utilization. We enhanced this capability by monitoring key system metrics, including CPU, host memory, storage, network, and accelerators (such as Intel Gaudi), as well as tracking OPEA application scaling.
Helm Charts Support¶
OPEA examples and microservices support Helm Charts as the packaging format on Kubernetes (k8s). The newly supported examples include AgentQnA, AudioQnA, FaqGen, VisualQnA. The newly supported microservices include chathistory, mongodb, prompt, and Milvus for data-prep and retriever. Helm Charts have now option to get Prometheus metrics from the applications.
Long-context Benchmark Support¶
We added the following two benchmark kits to response to the community’s requirements of long-context language models.
HELMET: a comprehensive benchmark for long-context language models covering seven diverse categories of tasks. The datasets are application-centric and are designed to evaluate models at different lengths and levels of complexity.
LongBench: a benchmark tool for bilingual, multitask, and comprehensive assessment of long context understanding capabilities of large language models.
Newly Supported Models¶
llama-3.2 (1B/3B/11B/90B)
glm-4-9b-chat
Qwen2/2.5 (7B/32B/72B)
Newly Supported Hardware¶
Notable Changes¶
GenAIExamples
Functionalities
New GenAI Examples
[AvatarChatbot] Initiate “AvatarChatbot” (audio) example (cfffb4c, 960805a)
[DBQnA] Adding DBQnA example in GenAIExamples (c0643b7, 6b9a27d)
[EdgeCraftRag] Add EdgeCraftRag as a GenAIExample (c9088eb, 7949045, 096a37a)
[GraphRAG] Add GraphRAG example a65640b
[Text2Image]: Add example for text2image 085d859
[WorkflowExecAgent] Add Workflow Executor Example bf5c391
Enhanced GenAI Examples
[AudioQnA] Add multi-language AudioQnA on Xeon 658867f
[AgentQnA] Update AgentQnA example for v1.1 release 5eb3d28
[ChatQnA] Enable vLLM Profiling for ChatQnA (00d9bb6, 7adbba6)
[ChatQnA] Add Terraform and Ansible Modules information 7c9ed04
[ChatQnA] Add chatqna wrapper for multiple model selection fb514bb
[DocSum] Supported multimedia and added new GUI powered by gradio (eb91d1f, 0cdeb94)
[DocSum] Support Chinese for Docsum b0f7c9c
[DocIndexRetriever] Update DocIndexRetriever Example to allow user passing in retriever/reranker params 62e06a0
[MultimodalQnA] Image and Audio Support Phase 1 bbc95bb
[Text2Image] Add Text2Image UI, UI tests, Readme, and Docker support c6fc92d
update examples accuracy 088ab98
Removed GenAI Pipelines
[ChatQnA] remove ChatQnA vllm-on-ray 40386d9
Changed Defaults
Enhanced Security
upgrade setuptools version to fix CVE-2024-6345 2b2c7ee
New Hardware Support
[ChatQnA] Add compose example for ChatQnA AMD ROCm deployment 6d3a017
[CodeGen] Adding files to deploy CodeGen application on AMD GPU 83172e9
[CodeTrans] Adding files to deploy CodeTrans application on AMD GPU 7e62175
[DocSum] Add compose example for DocSum amd rocm deployment b1bb6db
[FaqGen] Add compose example for FaqGen AMD ROCm 5648839
Dependency Versioning
[gradio] Bump gradio from 4.44.0 to 5.0.0 in /MultimodalQnA/ui/gradio f2f6c09
[TGI-CPU] Update TGI CPU image to latest official release 2.4.0-intel-cpu 0306c62
[TGI-Gaudi] Upgrade TGI Gaudi version to v2.0.6 1ff85f6a
[TEI-Gaudi] Use fixed version(1.5.0) of TEI Gaudi for stability 9ff7df9
[vLLM-Gaudi] align vllm hpu version to latest vllm-fork e9b1645
Deployment
[ChatQnA] Add instructions of modifying reranking docker image for NVGPU 2587179
[ChatQnA] setup ollama service in aipc docker compose def39cf
[ChatQnA] Make rerank run on gaudi for hpu docker compose 3c164f3
[ChatQnA] Added the k8s yaml for vLLM support e2f9037
[ChatQnA] manage your own ChatQnA pipelines. d16c80e
[ChatQnA] docker install instruction for csp 75df2c9
[ChatQnA] ChatQnA with Remote Inference Endpoints (Kubernetes) 56f770c
[ProductivitySuite] Simplify the deployment ProductivitySuite on kubernetes afc39fa
Fixed Issues
[AvatarChatbot] Fix left issue of tgi version update 393367e
[ChatQnA] Fix the service connection issue on GPU and modify the emb backend 944ae47
[ChatQnA] Fix AIPC docker container network issue 95b58b5
[ChatQnA] Fix top_n rerank docs 4a265ab
[ChatQnA] fix chatqna accuracy issue with incorrect penalty b0487fe
[ChatQnA] Fix AIPC retriever and UI error 773c32b
[DocSum] Fix docSum ui error in accessing parsed files 3744bb8
image build bug fix 82801d0
Documentation
[AudioQnA] Update AudioQnA README.md for its workflow 63bad29
[AudioQnA] Update AudioQnA README to add a couple usage details 184e9a4
[AgentQnA] Update Agent README.md for workflow 23b820e
[AgentQnA] Update README.md for usage experience a8f4245
[ChatQnA] Add steps to deploy opea services using minikube 6263b51
[ChatQnA] Update ChatQnA Readme for LLM Endpoint aa314f6
[ChatQnA] Update ChatQnA AIPC README b056ce6
[CodeGen] Update CodeGen README for its workflow 12469c9
[DocSum] Update DocSum README.md for its workflow fbde15b
[FaqGen] Update FaqGen README.md for its workflow 0c6b044
[InstructionTuning] instruction finetune README improvement 644c3a6
[MultiModalQnA] Update MultiModal README.md for workflow 40800b0
[ProductivitySuite] Update Productivity README.md for workflow 0edff26
[DocIndexRetriever] Update DocIndexRetriever README.md for workflow a3f9811
[SearchQnA] Update SearchQnA README.md for its workflow bf28c7f
[Translation] Update Translation README.md for workflow 35a4fef
[VideoQnA] Update VideoQnA README.md for workflow 1929dfd
CI/CD/UT
GenAIComps
Functionalities
New microservices:
Enhanced microservices:
Add DPO support in finetuning microservice 37f35140
Support Chinese for Docsum 9a00a3ea
Support file upload summary for DocSum microservice fa2ea642
Add support for Audio and Video summarization to Docsum baafa402
vLLM support for FAQGen f5c60f10
vLLM support for DocSum 550325d8
vLLM support for Codegen 24b9f03f
Enable vllm for Agent 4638c1d4
Multiple models and remote service support for langchain vLLM text-generation e3812a74
Set a higher default value(1.2) about repetition_penalty for codegen example to reduce repetition 5ed428f4
MultimodalQnA Image and Audio Support Phase 1 29ef6426
refine codetrans prompt, support parameter input 0bb019f8
add dynamic batching embedding/reranking 518cdfb6
Embedding compatible with OpenAI API 7bf1953c
Update RAGAgentLlama and ReActLlama c8e36390
[Agent] support custom prompt 3473bfb3
agent short & long term memory with langgraph. e39b08f3
support faqgen upload file in UI 453ff726
Add E2E Prometheus metrics to applications a6998a1d
Multiple models support for LLM TGI e879366c
Add RAG agent and ReAct agent implemention for llama3.1 served by TGI-gaudi e7fdf537
Support Llama3.2 vision and vision guard model 534c227a
Add Intel/toxic-prompt-roberta to toxicity detection microservice f6f620a2
Refactor milvus dataprep and retriever 84374a57
Removed microservices
Remove vllm ray 617e119f
Async support for microservices
Performance
New Hardware Support
Add vLLM ARC support with OpenVINO backend a2b9d95f
Enhanced Security
Validation
Combine CI/CD docker compose. 23c99c11
GenAIEvals
New Benchmark
Performance
Add new constant loader & Fix poisson loader issue e11588c
Support Poisson distributed requests for benchmark 7305ea3
Support customized prompts and max new tokens in chatqna e2e test 79a4ad3
Add namespace support for k8s performance test 70697d1
Support sharegpt dataset in chatqna e2e test 028bf63
[Benchmark] Get benchmark reports. 946c439
Accuracy
Control the concurrent number of requests in codegen acc test. 84e077e
integrate deepeval metric with remote endpoint, like tgi server. ffa65dc
Ragaaf - adding new metric ‘context recall’ cc7cebd
Ragaaf - adding new metric ‘context relevance’ f995c9c
Ragaaf (RAG assessment annotation free) 2413e70
Adding new metrics to ragas offering d1c1337
add crud ragas evaluation. f2bff45
Minimize requirements for user data for OPEA ragas f1593ea
Monitoring
Fixed Issues
[ChatQnA Benchmark] Fixed the output token in chatqnafixed.py 2c8ca26
Fix test duration time inaccurate issue 9d76832
Fix llm output token length issue 99ef325
Fix llm serving benchmark issue d6bafbd
Fix input token size(1024) 30adcbe
Ragas fix for use of metrics argument 0cf3631
fixed the number of ouput token & fixed the top_k=1 4af0a62
Fix JSON Return Format in getReqData Function a4be366
Documentation
GenAIInfra
GMC
Add manifests for new components e51fd62
HelmChart
[AgentQnA] Helm Chart for AgentQnA 66de41c
[AudioQnA] helm: Add audioQnA e2e helm chart 9efacee
[AudioQnA] helm-charts: Add gpt-sovits support 1f55e1a
[ChatQnA] Implement the nowrapper version chatqna 71c81d0
[FaqGen] Add FaqGen helm chart f847e05
[FaqGen] helm: Add llm-faqgen-tgi support 325126e
[HPA] helm/manifest: Sync HPA related k8s probe settings c399578
[VisualQnA] Add helm chart for VisualQnA example b077d44
[UI] support variants for multiple examples 96af2ad
[Nginx] helm-chart: Make nginx service type configurable a5c96ab
[Milvus] Add milvus support for data-prep and retriever-usvc d289b4e
Add helm chart for 3 components 881e2b5
accelerate also teirerank with Gaudi 620963f
CSP
terraform: add AWS/EKS deployment for ChatQnA bdb9af9
Monitoring
Add Grafana dashboard for monitoring OPEA application scaling in k8s 691bbc5
Add ServiceMonitors for rest of OPEA applications fc6235a
Add monitoring option to (ChatQnA) Helm charts dbd607e
Support alternative metrics on accelerated TGI / TEI instances cdd3585
Expose options such as collector.interval of memory bandwidth exporter in k8s manifests and docker for user configuration. 2517e79
Dependency Versioning
Changed Defaults
Change default model of codegen and codetrans 74476b7
Documentation
CI/CD/UT
Refactor CI scripts to support more components e09270a
Add github workflows to release helm chart 3910e3b
Fix link check failure (#481) (5 weeks ago) fc87ef3
Fix CI failures (#477) (5 weeks ago) 7e7b8ab
Optimize path and link validity check. 91bd163 -Enable image build process for memory-bandwidth-exporter ddeac46
Add hyperlinks and paths validation. d8cd3a1
Full Changelogs¶
Contributors¶
This release would not have been possible without the contributions of the following organizations and individuals.
Contributing Organizations¶
AMD
: AMD CPU/GPU support.Capital One
: Contributions to CI/CD process.China Unicom
: Contributions to the deployment of GenAI examples.Huawei
: Contributions to OPEA services deployment.Intel
: Development and improvements to GenAI examples, components, infrastructure, evaluation and studio.Nascenia Ltd.
: Contributions to documentation.National Chiao Tung University
: Contributions to documentation.Princeton University
: Integration of HELMET.
Individual Contributors¶
For a comprehensive list of individual contributors, please refer to the “Full Changelogs” section.