Setup Prometheus and Grafana to visualize microservice metrics

1. Setup Prometheus

We leverage existing Prometheus metrics supported by microservices. These metrics can be used to create Grafana dashboards.

wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz
tar -xvzf prometheus-2.52.0.linux-amd64.tar.gz
cd prometheus-2.52.0.linux-amd64/

vim prometheus.yml

Change the job target endpoint to the microservice you want to track metrics for. Make sure the service exposes a /metrics that follows Prometheus conventions.

Here is an example of exporting metrics data from a TGI microservice (inside a Kubernetes cluster) to Prometheus.

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "tgi"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["llm-dependency-svc.default.svc.cluster.local:9009"]

Next, run Prometheus server nohup ./prometheus --config.file=./prometheus.yml &.

You should now access localhost:9090/targets?search= to open the Prometheus UI.

1.1 Node Metrics (optional)

The Prometheus Node Exporter is required for collecting CPU/memory/network/storage metrics metrics. Deploy the Node Exporter via tarball by the guide.

Or install it in a K8S cluster by the following commands:

Ensure namespace monitoring was created in your K8S environment.

git clone https://github.com/opea-project/GenAIEval.git
cd GenAIEval/evals/benchmark/grafana/
kubectl apply -f prometheus_node_exporter.yaml

Add the following configuration to prometheus.yml:

scrape_configs:
  - job_name: "prometheus-node-exporter"
    metrics_path: /metrics
    static_configs:
      - targets: ["<NODE1_IP>:9100", "<NODE2_IP>:9100", ...]

The following Grafana dashboards rely on Prometheus Node Exporter:

  • cpu_grafana.json

  • node_grafana.json

Tested on the Prometheus Node Exporter 0.16.0.

1.2 Intel® Gaudi® Metrics (optional)

The Intel Gaudi Prometheus Metrics Exporter is required for collecting Intel® Gaudi® AI accelerator metrics.

Follow the guide to deploy the metrics exporter in Docker.

Or install it in a K8S cluster by the following commands:

Ensure namespace monitoring was created in your K8S environment.

git clone https://github.com/opea-project/GenAIEval.git
cd GenAIEval/evals/benchmark/grafana/
kubectl apply -f prometheus_gaudi_exporter.yaml

Add the following configuration to prometheus.yml:

scrape_configs:
  - job_name: "prometheus-gaudi-exporter"
    scrape_interval: 15s
    metrics_path: /metrics
    static_configs:
      - targets: ["<NODE1_IP>:41611", "<NODE2_IP>:41611", ...]

The following Grafana dashboard rely on Intel Gaudi Prometheus Metrics Exporter:

  • gaudi_grafana.json

Tested on the Intel Gaudi Prometheus Metrics Exporter 1.17.0.

Restart Prometheus after saving the changes.

2. Setup Grafana

Grafana provides numerous dashboards to visualize data from a data source. Here we introduce how to visualize TGI metrics.

wget https://dl.grafana.com/oss/release/grafana-11.0.0.linux-amd64.tar.gz
tar -zxvf grafana-11.0.0.linux-amd64.tar.gz

Run the grafana server

cd grafana-v11.0.0/
nohup ./bin/grafana-server &

To access the Grafana dashboard, point your browser to http://localhost:3000. You will need to login using the default credentials.

username: admin 
password: admin

If you have any Grafana installation issue please check this link.

The next step is to configure the data source for Grafana to scrape metrics from. Click on the “Data Source” button, select Prometheus, and specify the Prometheus url localhost:9090. If the dashboard does not display data, under the Other section for the Data Source, change the HTTP method to GET.

3. Import Grafana Dashboard

After setup the Grafana server, then you can import a Grafana Dashboard through uploading a dashboard JSON file in the Grafana UI under Home > Dashboards > Import dashboard. You can use a file like tgi_grafana.json. Open the dashboard, and you will see different panels displaying the metrics data.

In this folder, we also provides some Grafana dashboard JSON files for your reference.

  • chatqna_megaservice_grafana.json: A sample Grafana dashboard JSON file for visualizing the metrics of ChatQnA microservices. Selecting different job_name options in the top-left of the dashboard displays the metrics for the corresponding microservices.

  • tei_grafana.json: A sample Grafana dashboard JSON file for visualizing TEI metrics.

  • tgi_grafana.json: A sample Grafana dashboard JSON file for visualizing TGI metrics.

  • redis_grafana.json: A sample Grafana dashboard JSON file for visualizing the Redis metrics. For importing the redis metrics, you need to add the new connection and Redis data source in Grafana. Please refer this link for more details.

  • gaudi_grafana.json: A sample Grafana dashboard JSON file for visualizing the Intel® Gaudi® AI accelerator metrics in a container cluster for compute workload.

  • cpu_grafana.json: A sample Grafana dashboard JSON file for visualizing the CPU metrics.

  • node_grafana.json: A sample Grafana dashboard JSON file for visualizing the node metrics.