Helm charts for deploying GenAI Components and Examples

This directory contains Helm charts for GenAIComps and GenAIExamples deployment on Kubernetes.

Table of Contents

Helm Charts

List of supported workloads and components.


AI application examples you can run directly on Xeon and Gaudi. You can also refer to these examples to develop your own customized AI application.

Helm chart

Link to GenAIExamples




An example of chatbot for question and answering through retrieval argumented generation (RAG).


Agent QnA

A hierarchical multi-agent system for question-answering applications.


Audio QnA

An example of chatbot for question and answering with audio file support.


Code Generation

An example of copilot designed for code generation in Visual Studio Code.


Code Translation

An example of programming language code translation.


Document Summarization

An example of document summarization.


FAQ generator

An example to generate FAQs.


Visual QnA

An example of answering open-ended questions based on an image.


Components which are building blocks for AI application. All components Helm charts are put in the ./common directory, and the support list is growing. Refer to GenAIComps for details of each component.

Deploy with Helm charts

From Source Code

These Helm charts are designed to be easy to start, which means you can deploy a workload easily without further options. However, HUGGINGFACEHUB_API_TOKEN should be set in most cases for a workload to start up correctly. Examples of deploy a workload:

export myrelease=mytgi
export chartname=common/tgi
helm dependency update $chartname
helm install $myrelease $chartname --set global.HUGGINGFACEHUB_API_TOKEN="insert-your-huggingface-token-here"

Depending on your environment, you may want to customize some of the options, see Helm Charts Options for further information.

Using Helm Charts repository

The Helm charts are released to https://github.com/orgs/opea-project/packages. You can check the list there and deploy with

export chartname=chatqna
helm install myrelease oci://ghcr.io/opea-project/charts/${chartname}

Helm Charts Options

Here is a list of a few important options that user may want to change.

For more options, read each Helm chart’s README.md file and check its values.yaml or gaudi-values.yaml files (if applicable).

There are global options (which should be shared across all components of a workload) and specific options that only apply to one component.

Helm chart





Your own HuggingFace token, there is no default value. If not set, you might fail to start the component.


http_proxy https_proxy no_proxy

Proxy settings. If you are running the workloads behind the proxy, you’ll have to add your proxy settings here.



The PersistentVolumeClaim you want to use as HuggingFace hub cache. Default “” means not using PVC. Only one of modelUsePVC/modelUseHostPath can be set.



If you don’t have Persistent Volume in your k8s cluster and want to use local directory as HuggingFace hub cache, set modelUseHostPath to your local directory name. Note that this can’t share across nodes. Default “”. Only one of modelUsePVC/modelUseHostPath can be set.



Enable monitoring for (ChatQnA) service components. See Pre-conditions before enabling!



The model id you want to use for tgi server. Default “Intel/neural-chat-7b-v3-3”.

Deploy the Helm Charts on Intel® Xeon® Processors with Intel® Trust Domain Extensions (Intel® TDX)

See TDX instructions on how to deploy the Helm Charts on Intel® Xeon® processors with Intel® Trust Domain Extensions (Intel® TDX).

Using HPA (autoscaling)

See HPA instructions on how to enable horizontal pod autoscaling for service components, based on their usage metrics.

Using Persistent Volume

It’s common to use Persistent Volume (PV) for model caches (HuggingFace hub cache) in a production k8s cluster. PersistentVolumeClaim (PVC) can be passed to containers, but it’s the user’s responsibility to create the PVC depending on your k8s cluster’s capability.

This example setup uses NFS on Ubuntu 22.04.

  • Export NFS directory from NFS server

sudo apt install nfs-kernel-server
sudo mkdir -p /data/nfspv && sudo chown nobody:nogroup /data/nfspv && sudo chmod 777 /data/nfspv
echo "/data/nfspv,sync,no_subtree_check)" |sudo tee -a /etc/exports
sudo systemctl restart nfs-server

  • Create a Persistent Volume

cat <<EOF >nfspv.yaml
apiVersion: v1
kind: PersistentVolume
  name: nfspv
    storage: 300Gi
  volumeMode: Filesystem
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs
    path: "/data/nfspv"
    server: ""
    readOnly: false
kubectl apply -f nfspv.yaml
  • Create a PersistentVolumeClaim

cat << EOF > nfspvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
  name: model-volume
    - ReadWriteMany
  storageClassName: "nfs"
      storage: 100Gi
  • Set global.modelUsePVC when doing Helm install, or modify the values.yaml

helm install tgi common/tgi --set global.modelUsePVC=model-volume

Using Private Docker Hub

By default, we’re using Docker images from official Docker hub, with Docker image version aligned with OPEA releases. If you have private hub, see the following examples.

To use local Docker registry:

find . -name '*values.yaml' -type f -exec sed -i "s#repository: opea/*#repository: ${OPEA_IMAGE_REPO}opea/#g" {} \;

Generate manifests from Helm Charts

Some users may want to use Kubernetes manifests (YAML files) for workload deployment, we do not maintain manifests itself, and will generate them using helm template. See update_genaiexamples.sh for how the manifests are generated for supported GenAIExamples. See update_manifests.sh for how the manifests are generated for supported GenAIComps. Please note that the above scripts have hardcoded settings to reduce user configuration effort. They are not supposed to be directly used by users.