Observability for OPEA Workloads in Kubernetes

Table of Contents

Introduction

Helm chart monitoring option enables observability support for the OPEA workloads; Prometheus metrics for the service components, and Grafana visualization for them.

Scaling the services automatically based on their usage with HPA also relies on these metrics.

Metrics / visualization add-ons explains how to install additional monitoring for node and device metrics, and Grafana for visualizing those metrics.

Pre-conditions

Prometheus + Grafana install

If cluster does not run Prometheus operator yet, it SHOULD be be installed before enabling monitoring, e.g. by using a Helm chart for it: https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack

To install (older 55.x version) of Prometheus & Grafana:

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
$ prom_ns=monitoring  # namespace for Prometheus
$ kubectl create ns $prom_ns
$ helm install prometheus-stack prometheus-community/kube-prometheus-stack --version 55.5.2 -n $prom_ns

OPEA Helm options

If Prometheus & Grafana are installed under some other release name than prometheus-stack, provide that as global.prometheusRelease value for the OPEA service Helm install, or in its values.yaml file. Otherwise Prometheus ignores the installed serviceMonitor objects.

Install

Monitoring support + Grafana access

Install (e.g. ChatQnA) Helm chart with --set global.monitoring=true option.

Use port-forward to access Grafana

kubectl port-forward service/grafana 3000:80

Open your browser and navigate to http://localhost:3000. Use “admin/prom-operator” as the username and the password to login.

Dashboards

Currently, when monitoring option is enabled for ChatQnA and DocSum Helm charts, also OPEA application monitoring dashboard is installed:

Metrics dashboard

When HPA scaling is enabled, additional application scaling dashboard is installed:

Scaling dashboard

For other applications, if they were installed with monitoring option enabled, dashboard(s) for monitoring them can be installed afterwards, with:

$ helm install dashboards dashboards/ --set global.monitoring=true

NOTE: dashboards will list available applications and their metrics only after they’ve processed their first token, because related metrics are not available before that!

Verify

Check installed Prometheus service names:

$ prom_ns=monitoring  # Prometheus namespace
$ kubectl -n $prom_ns get svc

(Object names depend on whether Prometheus was installed from manifests, or Helm, and the release name given for its Helm install.)

Use service name matching your Prometheus installation:

$ prom_svc=prometheus-stack-kube-prom-prometheus  # Metrics service

Verify Prometheus found metric endpoints for chart services, i.e. last number on curl output is non-zero:

$ chart=chatqna # OPEA chart release name
$ prom_url=http://$(kubectl -n $prom_ns get -o jsonpath="{.spec.clusterIP}:{.spec.ports[0].port}" svc/$prom_svc)
$ curl --no-progress-meter $prom_url/metrics | grep scrape_pool_targets.*$chart

Then check that Prometheus metrics from a relevant LLM inferencing service are available.

For vLLM:

$ curl --no-progress-meter $prom_url/api/v1/query? \
  --data-urlencode 'query=vllm:cache_config_info{service="'$chart'-vllm"}' | jq

Or TGI:

$ curl --no-progress-meter $prom_url/api/v1/query? \
  --data-urlencode 'query=tgi_queue_size{service="'$chart'-tgi"}' | jq

NOTE: inferencing services provide metrics only after they’ve processed their first request. And reranking service only after query context data has been uploaded!