OPEA Release Notes v1.3

We are excited to announce the release of OPEA version 1.3, which includes significant contributions from the open-source community. This release addresses over 520 pull requests.

More information about how to get started with OPEA v1.3 can be found on the Getting Started page. All project source code is maintained in the opea-project organization. To pull Docker images, please access the Docker Hub. For instructions on deploying Helm Charts, please refer to the guide.

Table of Contents

What’s New in OPEA v1.3

This release introduces exciting capabilities, optimizations, and user-centric enhancements:

Advanced Agent Capabilities

  • Multi-Turn Conversation: Enhanced the OPEA agent framework for dynamic, context-aware dialogues. (GenAIComps#1248)

  • Finance Agent Example: A financial agent example for automating financial data aggregation and leveraging LLMs to generate insights, forecasts, and strategic recommendations. (GenAIExamples#)

Performance and Scalability

  • vLLM Enhancement: Integrated vLLM as the default LLM serving backend for key GenAI examples across Intel® Xeon® processors, Intel® Gaudi® accelerators, and AMD® GPUs. (GenAIExamples#)

  • KubeAI Operator for OPEA (Alpha release): Simplified OPEA inference operations in cloud environment and enabled optimal out-of-the-box performance for specific models and hardware using profiles. (GenAIInfra#945)

Ecosystem Integrations

  • Haystack Integration: Enabled OPEA as a backend of Haystack. (Haystack-OPEA#)

  • Cloud Readiness: Expanded automated Terraform deployment for ChatQnA to include support for Azure, and enabled CodeGen deployments on AWS and GCP. (GenAIExamples#1731)

New GenAI Capabilities

  • OPEA Store: Delivered a unified data store access API and a robust data store integration layer that streamlines data store integration. ArangoDB was integrated. (GenAIComps#1493)

  • CodeGen using RAG and Agent: Leveraged RAG and code agent to provide an additional layer of intelligence and adaptability for CodeGen example. (GenAIExamples#1757)

  • Enhanced Multimodality: Added support for additional audio file types (.mp3) and supported spoken audio captions with image ingestion. (GenAIExamples#1549)

  • Struct to Graph: Supported transforming structured data to graphs using Neo4j graph database. (GenAIComps#1502)

  • Text to Graph: Supported creating graphs from text by extracting graph triplets. (GenAIComps#1357, GenAIComps#)

  • Text to Cypher: Supported generating and executing Cypher queries from natural language for graph database retrieval. (GenAIComps#1319)

Enhanced Evaluation

  • Enhanced Long-Context Model Evaluation: Supported evaluating long-context model on Intel® Gaudi® with vLLM. (HELMET#20)

  • TAG-Bench for SQL Agents: Integrated TAG-Bench to evaluate complex SQL query generation (GenAIEval#).

  • DocSum Support: GenAIEval now supports evaluating the performance of DocSum. (GenAIEval#252)

  • Toxicity Detection Evaluation: Introduced a workflow to evaluate the capability of detecting toxic language based on LLMs. (GenAIEval#241)

  • Model Card: Added a model card generator for generating reports containing model performance and fairness metrics. (GenAIEval#236)

Observability

  • OpenTelemetry Tracing: Leveraged OpenTelemetry to enable tracing for ChatQnA and AgentQnA along with TGI and TEI. (GenAIExamples#1542)

  • Application dashboards: Helm installed application E2E performance dashboard(s). (GenAIInfra#800)

  • E2E (end-to-end) metric improvements: E2E metrics are summed together for applications that use multiple megaservice instances. Tests for the E2E metrics + fixes. (GenAIComps#1301, (GenAIComps#)

Better User Experience

  • GenAIStudio: Supported drag-and-drop creation of agentic applications. (GenAIStudio#50)

  • Documentation Refinement: Refined READMEs for key examples to help readers easily locate documentation tailored to deployment, customization, and hardware. (GenAIExamples#1741)

  • Optimized Dockerfiles: Simplified application Dockerfiles for faster image builds. (GenAIExamples#1585)

Exploration

  • SQFT: Supported low-precision sparse parameter-efficient fine-tuning on LLMs. (GenAIResearch#1)

Newly Supported Models

OPEA introduced the support for the following models in this release.

Model

TGI-Gaudi

vLLM-CPU

vLLM-Gaudi

vLLM-ROCm

OVMS

Optimum-Habana

PredictionGuard

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

-

-

deepseek-ai/DeepSeek-R1-Distill-Llama-70B

-

-

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

-

-

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

-

-

deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

-

-

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

-

-

deepseek-ai/Deepseek-v3

-

-

-

Hermes-3-Llama-3.1-8B

-

-

-

-

-

ibm-granite/granite-3.2-8b-instruct

-

-

-

-

-

Phi-4-mini

x

x

x

x

-

Phi-4-multimodal-instruct

x

x

x

x

-

mistralai/Mistral-Small-24B-Instruct-2501

-

-

-

mistralai/Mistral-Large-Instruct-2411

x

-

-

-

(✓: supported; -: not validated; x: unsupported)

Newly Supported Hardware

Other Notable Changes

Expand the following lists to read:

GenAIExamples
  • Functionalities

    • [AgentQnA] Added web search tool support and simplify the run instructions. (#1656) (e8f2313)

    • [ChatQnA] Added support for latest deepseek models on Gaudi (#1491) (9adf7a6)

    • [EdgeCraftRAG] A sleek new UI based on Vue and Ant Design for enhanced user experience, supporting concurrent multi-requests on vLLM, JSON pipeline configuration, and API-based prompt modification. (#1665) (5a50ae0)

    • [EdgeCraftRAG] Supported multi-card deployment of Intel ARC GPU for vllm inference (#1729) (1a0c5f0)

    • [FaqGen] Merged FaqGen into ChatQnA for unified Chatbot experience. (#1654) (6d24c1c)

  • Benchmark

    • [ChatQnA] Provided unified scalable deployment and benchmarking support for examples (#1315) (ed16308)

  • Deployment

    • Sync values yaml file for 1.3 release (#1748) (46ebb78)

  • Bug Fixes

    • [AgentQnA] Fixed errors for running AgentQnA on xeon with openai and update readme (#1664) (fecc227)

    • [AudioQnA] Fixed the LLM model field for inputs alignment (#1611) (2dfcfa0)

  • Documentation

    • Updated README.md for OPEA OTLP tracing (#1406) (4c41a5d)

    • Updated README.md for Agent UI (#1495) (88a8235)

    • Refactored AudioQnA README (#1508) (9f36e84)

    • Added a new section to change LLM model such as deepseek based on validated model table in LLM microservice (#1501) (970b869)

    • Updated README.md of AIPC quick start (#1578) (852bc70)

    • Added short descriptions to the images OPEA publishes on Docker Hub (#1637) (68747a9)

  • CI/CD/UT

    • Added UT for rerank finetuning on Gaudi (#1472) (5f4b182)

    • Enabled Gaudi 3, Rocm and Arc on manually release test. (#1615) (63b789a)

    • Enabled base image build in CI/CD (#1669) (2204fe8)

    • ChatQnA run CI with latest base image, group logs in GHA outputs. (#1736) (c48cd65)

GenAIComps
  • Functionalities

    • [agent] Enabled custom prompt for react_llama and react_langgraph (#1391) (558a2f6)

    • [dataprep] Added Multimodal support for Milvus for dataprep component (#1380) (006bd91)

    • [dataprep]: New Arango integration (#1558)

    • [dataprep]: Added ability to customize Dataprep unique input parameters by way of subclassing the DataprepRequest pydantic model. Avoids having to introduce parameters unique to a few Dataprep integrations across all Dataprep providers (#1525)

    • [retrieval]: New Arango integration (#1558)

    • [cores/mega] Added remote endpoint support (#1399) (1871dec)

    • [docsum] Enlarged DocSum prompt buffer (#1471) (772ef6e)

    • [embeddings] Refined CLIP embedding microservice by leveraging the third-party CLIP (#1298) (7727235)

    • [finetuning] Added xtune to finetuning for Intel ARC GPU (#1432) (80ef317)

    • [guardrails] Added native support for toxicity detection guardrail microservice (#1258) (625aec9)

    • [llm/text-generation] Added support for string message in Bedrock textgen (#1291) (364ccad)

    • [ipex] Added native LLM microservice using IPEX (#1337) (d51a136)

    • [lvm] Integrated vLLM to lvm as a backend (#1362) (831c5a3)

    • [lvm] Integrated UI-TARS vLLM in lvm component (#1458) (4a15795)

    • [nubula] Docker deployment support for Nebula graph database (#1396) (342c1ed)

    • [OVMS] Text generation, Embeddings and Reranking microservices based on OVMS component (#) (78b94fc)

    • [retriever/milvus] Added Multimodal support for Milvus for retriever component (#1381) (40d431a)

    • [text2image & image2image] Enriched input parameters of text2image and image2image. (#1339) (42f323f)

    • Refined synchronized I/O in asynchronous functions (#1300) (b08571f)

  • Bug Fixes

    • Docsum error by HuggingFaceEndpoint (#1246) (30e3dea)

    • Fixed tei embedding and tei reranking bug (#1256) (fa01f46)

    • Fixed web-retrievers hub client and tei endpoint issue (#1270) (ecb7f7b)

    • Fixed Dataprep Ingest Data Issue. (#1271) (b777db7)

    • Fixed metric id issue when init multiple Orchestrator instance (#1280) (f8e6216)

    • Bug Fix neo4j dataprep ingest error handling and skip_ingestion argument passing (#1288) (4a90692)

    • Fixed the retriever issue of Milvus (#1286) (47f68a4)

    • Fixed Qdrant retriever RAG issue. (#1289) (c3c8497)

    • Fixed agent message format. (#1297) (022d052)

    • Fixed milvus dataprep ingest files failure (#1299) (a033c05)

    • Fixed docker image security issues (#1321) (589587a)

    • Megaservice / orchestrator metric testing + fixes (#1348) (1064b2b)

    • Fixed finetuning python regex syntax error (#1446) (380f95c)

    • Upgraded Optimum Habana version to fix security check issue (#1571) (83350aa)

    • Make llamaguard compatible with both TGI and vLLM (#1581) (4024302)

  • Documentation

    • GraphRAG README/compose fixes post refactor (#1221) (b38d9f3)

    • Updated docs for LLamaGuard & WildGuard Microservice (#1259) (0df374b)

    • Fixed Readme errors in dataprep component for all VectorDBs (#1377) (492f028)

    • Refined the README for llms/doc-summarization (#1437) (559ebb2)

  • CI/CD/UT

    • Refine dataprep test scripts (#1305) (a4f6af1)

GenAIEval
  • Auto Tuner

    • RAG Pilot - A RAG pipeline tuning tool allowing fine-grained control over key aspects of parsing, chunking, postprocessing, and generating selection, enabling better retrieval and response generation. (#243) (97da8f2)

  • Monitoring

    • Integrate with memory bandwidth exporter to support collection and reporting of memory bandwidth, cpu, mem metrics. (#218) (df5fd3e)

    • Add benchmark docker image to support getting metrics among microservices and fixed a missing package for benchmarking with Dockerfile (#249) (dc3409f)

  • Metrics

    • Collect vllm latency metric for e2e test (#244) (1b6a91d)

  • Bug Fixes

    • Fix relative path issue for possion. (#234) (3b9981a)

    • Add the missed file in release package (#233) (28ed0db)

    • fix the error of TTFT and TPOT while the bench target is chatqna_qlist_pubmed (#238) (da04a9f)

    • Fix performance benchmark with pubmed (#239) (5c8ab6e)

  • Documentation

    • Add recommendations to platform optimization documentation (ea086a6)

GenAIInfra
  • HelmChart

    • [TDX] Added Intel TDX support to helm charts (#799) (040860e)

    • Add helm starter chart for developing new charts (#776) (6154b6c)

    • HPA enabling usability improvement (#770) (3016f5f)

    • Helm chart for Ollama (#774) (7d66afb)

    • Helm: Added Qdrant support (#796) (99ccf0c)

    • Chatqna: Added Qdrant DB support (#813) (5576cfd)

    • Helm installed application metric Grafana dashboards (#800) (f46e8c1)

    • LLM TextGen Bedrock Support (#811) (da37b9f)

    • codegen: Add rag pipeline and change default UI (#985) (46b1b6b)

    • dataprep/retriever: Support airgap offline environment (#980) (b9b10e9)

  • CSP

    • Added automated provisioning of CosmosDB and App Insights for OPEA applications (#657) (d29bd2d)

  • Bug Fixes

    • Fixed the helm chart release dependency update (#842) (f121edd)

  • CI/CD/UT

    • CI: Enabled milvus related test (#767) (5b2cca9)

GenAIStudio
  • Updated studio fe table UI and updated studio be according to the dataprep refactor (#32) (1168507)

  • [Feat] Added GenAI Studio UI improvement (#48) (ad64f7c)

  • Enabled LLM Traces for sandbox (#51) (df6b73e)

  • Migrated to internal k8 mysql and enable deployment package generation for agentqna (#52) (0cddbe0)

Deprecations

Deprecated Examples

The following GenAI examples are deprecated, and were removed since OPEA v1.3:

Example

Migration Solution

Reasons for Deprecation

FaqGen

Use the example ChatQnA instead.

Provide users with a unified chatbot experience and reduce redundancy.

Deprecated Docker Images

The following Docker images are deprecated, and not updated / tagged for OPEA v1.3 release:

Deprecated Docker Image

Migration Solution

Reasons for Deprecation

opea/agent-ui

Use opea/agent-openwebui instead.

Open WebUI based UI for better user experience.

opea/chathistory-mongo-server

Use opea/chathistory-mongo instead.

Follow the OPEA naming rules

opea/faqgen

Use opea/chatqna or opea/chatqna-without-rerank instead.

FaqGen is deprecated.

opea/faqgen-ui

Use opea/chatqna-ui instead.

FaqGen is deprecated.

opea/faqgen-react-ui

Use opea/chatqna-ui instead.

FaqGen is deprecated.

opea/feedbackmanagement

Use opea/feedbackmanagement-mongo instead.

Follow the OPEA naming rules

opea/promptregistry-mongo-server

Use opea/promptregistry-mongo instead.

Follow the OPEA naming rules

The following Docker images are deprecated, and will not be updated / tagged since OPEA v1.4 release:

Deprecated Docker Image

Migration Solution

Reasons for Deprecation

opea/chathistory-mongo

Use opea/chathistory instead. The Docker image will be released with the latest tag before the v1.4 release.

OPEA introduced OPEAStore to decouple chathistory component from MongoDB.

opea/feedbackmanagement-mongo

Use opea/feedbackmanagement instead. The Docker image will be released with the latest tag before the v1.4 release.

OPEA introduced OPEAStore to decouple feedback management component from MongoDB.

opea/promptregistry-mongo

Use opea/promptregistry instead. The Docker image will be released with the latest tag before the v1.4 release.

OPEA introduced OPEAStore to decouple prompt registry component from MongoDB.

All OPEA docker images

Deprecated GenAIExample Variables

Example

Type

Variable

Migration Solution

ChatQnA

environment variable

your_hf_api_token

Removed from Intel AIPC deployment. Use the environment variable HUGGINGFACEHUB_API_TOKEN instead. This change aligns with the standardized naming conventions for environment variables.

ChatQnA

environment variable

OLLAMA_HOST

Removed from Intel AIPC deployment. Instead, users can customize LLM_SERVER_HOST_IP in ChatQnA/docker_compose/intel/cpu/aipc/compose.yaml.

DocIndexRetriever

environment variable

TGI_LLM_ENDPOINT

Removed due to no uses.

DocIndexRetriever

environment variable

MEGA_SERVICE_HOST_IP

Removed due to no uses.

DocIndexRetriever

environment variable

LLM_SERVICE_HOST_IP

Removed due to no uses.

GraphRAG

environment variable

MAX_OUTPUT_TOKENS

Instead, it has been split into two new environment variables: MAX_INPUT_TOKENS (default: 4096) and MAX_TOTAL_TOKENS (default: 8192) to control the maximum token limits.

Deprecated GenAIComps Parameters

Component

Parameter

Migration Solution

agent

with_store of agent_config in the Assistants APIs

Its functionality is now fully covered by the new memory_type parameter. In v1.3, please use "with_memory": true and "memory_type": persistent as its replacement. The with_memory parameter in agent_config of APIs is now enabled by default (true) for enabling multi-turn conversations. Please refer to the guide for more details.

Updated Dependencies

Dependency

Hardware

Scope

Version

Version in OPEA v1.2

Comments

gradio

-

all examples

5.11.0

5.5.0

huggingface/text-generation-inference

AMD GPU

all examples

2.4.1-rocm

2.3.1-rocm

huggingface/text-embeddings-inference

all

all examples

cpu-1.6

cpu-1.5

langchain
langchain_community

-

llms/doc-summarization
llms/faq-generation

0.3.14

0.3.15

Avoid bugs in FaqGen and DocSum.

optimum-habana

Gaudi

lvms/llama-vision

1.17.0

-

pytorch

Gaudi

all components

2.5.1

2.4.0

transformers

-

lvms/llama-vision

4.48.0

4.45.1

vllm

Xeon

all supported examples except EdgeCraftRAG

v0.8.3

-

vllm

Gaudi

all supported examples except EdgeCraftRAG

v0.6.6.post1+Gaudi-1.20.0

v0.6.4.post2+Gaudi-1.19.0

vllm

AMD GPU

all supported examples

rocm6.3.1_instinct_vllm0.8.3_20250410

-

Changes to Default Behavior

  • [agent] The default model changed from meta-llama/Meta-Llama-3-8B-Instruct to meta-llama/Llama-3.3-70B-Instruct.

Validated Hardware

  • Intel® Arc™ Graphics GPU (A770)

  • Intel® Gaudi® Al Accelerators (2nd, 3rd)

  • Intel® Xeon® Scalable processor (4th, 5th, 6th)

  • AMD® Instinct™ MI300X Accelerators (CDNA3)

Validated Software

  • AMD® ROCm™ Software v6.3.3

  • Docker 28.0.4

  • Docker Compose v2.34.0

  • Intel® Gaudi® software and drivers v1.20

  • Kubernetes v1.29.15

  • TEI v1.6

  • TGI v2.4.0 (Xeon), v2.3.1(Gaudi), v2.4.1 (ROCm)

  • Torch v2.5.1

  • Ubuntu 22.04

  • vLLM v0.8.3 (Xeon/ROCm), v0.6.6 (Gaudi)

Known Issues

Full Changelogs

Contributors

This release would not have been possible without the contributions of the following organizations and individuals.

Contributing Organizations

  • Amazon: Ollama deployment, Bedrock integration, OVMS integration and bug fixes.

  • AMD: vLLM enablement on AMD GPUs for key examples, AMD GPUs enabling on more examples, AMD OPEA blogs.

  • ArangoDB: OPEA Store and ArangoDB integration.

  • Intel: Development and improvements to GenAI examples, components, infrastructure, and evaluation.

  • Infosys: Azure support and documentation updates.

  • National Chiao Tung University: Documentation updates.

  • Prediction Guard: Maintenance of Prediction Guard components.

Individual Contributors

For a comprehensive list of individual contributors, please refer to the Full Changelogs section.