# Observability for OPEA Workloads in Kubernetes

## Table of Contents

- [Introduction](#introduction)
- [Pre-conditions](#pre-conditions)
  - [Prometheus + Grafana install](#prometheus--grafana-install)
  - [OPEA Helm options](#opea-helm-options)
- [Install](#install)
  - [Monitoring support + Grafana access](#monitoring-support--grafana-access)
  - [Dashboards](#dashboards)
- [Verify](#verify)
- [Dashboards](#dashboards)

## Introduction

Helm chart `monitoring` option enables observability support for the OPEA workloads;
[Prometheus](https://prometheus.io/) metrics for the service components,
and [Grafana](https://grafana.com/) visualization for them.

Scaling the services automatically based on their usage with [HPA](HPA.md) also relies on these metrics.

[Metrics / visualization add-ons](../kubernetes-addons/Observability/README.md)
explains how to install additional monitoring for node and device metrics,
and Grafana for visualizing those metrics.

## Pre-conditions

### Prometheus + Grafana install

If cluster does not run [Prometheus operator](https://github.com/prometheus-operator/kube-prometheus)
yet, it SHOULD be be installed before enabling monitoring, e.g. by using a Helm chart for it:
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack

To install (older 55.x version) of Prometheus & Grafana:

```console
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
$ prom_ns=monitoring  # namespace for Prometheus
$ kubectl create ns $prom_ns
$ helm install prometheus-stack prometheus-community/kube-prometheus-stack --version 55.5.2 -n $prom_ns
```

### OPEA Helm options

If Prometheus & Grafana are installed under some other release name than `prometheus-stack`,
provide that as `global.prometheusRelease` value for the OPEA service Helm install,
or in its `values.yaml` file. Otherwise Prometheus ignores the installed
`serviceMonitor` objects.

## Install

### Monitoring support + Grafana access

Install (e.g. ChatQnA) Helm chart with `--set global.monitoring=true` option.

Use port-forward to access Grafana

```
kubectl port-forward service/grafana 3000:80
```

Open your browser and navigate to http://localhost:3000. Use
"admin/prom-operator" as the username and the password to login.

### Dashboards

Currently, when `monitoring` option is enabled for ChatQnA and DocSum
Helm charts, also OPEA application monitoring dashboard is installed:

![Metrics dashboard](./assets/opea-metrics.png)

When [HPA scaling](HPA.md) is enabled, additional application scaling
dashboard is installed:

![Scaling dashboard](./assets/opea-scaling.png)

For other applications, if they were installed with `monitoring`
option enabled, dashboard(s) for monitoring them can be installed
afterwards, with:

```
$ helm install dashboards dashboards/ --set global.monitoring=true
```

NOTE: dashboards will list available applications and their metrics
only after they've processed their first token, because related
metrics are not available before that!

## Verify

Check installed Prometheus service names:

```console
$ prom_ns=monitoring  # Prometheus namespace
$ kubectl -n $prom_ns get svc
```

(Object names depend on whether Prometheus was installed from manifests, or Helm,
and the release name given for its Helm install.)

Use service name matching your Prometheus installation:

```console
$ prom_svc=prometheus-stack-kube-prom-prometheus  # Metrics service
```

Verify Prometheus found metric endpoints for chart services, i.e. last number on `curl` output is non-zero:

```console
$ chart=chatqna # OPEA chart release name
$ prom_url=http://$(kubectl -n $prom_ns get -o jsonpath="{.spec.clusterIP}:{.spec.ports[0].port}" svc/$prom_svc)
$ curl --no-progress-meter $prom_url/metrics | grep scrape_pool_targets.*$chart
```

Then check that Prometheus metrics from a relevant LLM inferencing service are available.

For vLLM:

```console
$ curl --no-progress-meter $prom_url/api/v1/query? \
  --data-urlencode 'query=vllm:cache_config_info{service="'$chart'-vllm"}' | jq
```

Or TGI:

```console
$ curl --no-progress-meter $prom_url/api/v1/query? \
  --data-urlencode 'query=tgi_queue_size{service="'$chart'-tgi"}' | jq
```

**NOTE**: inferencing services provide metrics only after they've processed their first request.
And reranking service only after query context data has been uploaded!