Working with Microservices-18: Setting Up An Alarm By Using the Grafana Dashboard and Prometheus ConfigMap.yml

We will run Prometheus and Grafana together, collect metrics of the Kubernetes cluster with them, and then set up alarms using these metrics. We will set up the alarms (alarm rules) by using both the “Grafana” and “Prometheus ConfigMap.yml” to monitor the Kubernetes cluster. We will view and examine the alarms on the dashboards of Prometheus and Grafana. Also, we will install Prometheus and Grafana by using Helm Chart. We will do them practically and step by step.

9 min readFeb 19, 2024

Topics we will cover:

Previous article:
1. Changing The Ports and Services of Prometheus and Grafana
2. Configuring the Security Group of Cluster Nodes
3. Running Prometheus
4. Running Grafana and Connecting to Prometheus
5. As a result
6. Previous article: “Working with Microservices-17: Monitoring and Creating an Alarm with Prometheus and Grafana in the Production Stage”
7. References

This article:
1. Setting Up An Alarm By Using the Grafana and Examining it on the Grafana Dashboard
2. Setting Up An Alarm By Using “ConfigMap.yml” File and Examining it on the Prometheus Dashboard
3. Installing Prometheus and Grafana by Using Helm Chart
4. As a result
5. Next post: “Working with Microservices-19: Explanation of the Testing Stage, and Running a Unit Test and Configuring Code Coverage Report using Jacoco tool.”
6. References

If you like the article, I will be happy if you click the Medium Follow button to encourage me to write more, and not miss future articles.

Your clapping, following, or subscribing helps my articles to reach a broader audience. Thank you in advance for them.

In the previous articles, we ran Prometheus and Grafana together, then collected metrics of the Kubernetes cluster with them. We viewed and examined these metrics on the dashboards of Prometheus and Grafana.

Working with Microservices-17: Monitoring and Creating an Alarm with Prometheus and Grafana in the…

We will run Prometheus and Grafana together, collect metrics of the Kubernetes cluster with them, and set up alarms…

cmakkaya.medium.com

Now, we continue from where we left off.

1. Setting Up An Alarm By Using the Grafana and Examining it on the Grafana Dashboard

Go to Home > Alerting > Alert rules and click on the “ New alert rule” button as seen below;

In a new window that opens, select “Grafana managed alert” as seen below;

Select a metric “1-A ” section, You can select “now-15m to now” , metric only “kublet_running_pods”, Operations > Aggregations “Sum” as seen below. Then click on the “ Run queries” button and try the rule.

Note: With the above settings, we see pods running on all nodes. If we wanted to see only the pods running on the Worker-1 node, we would need to set the metric value as follows. We use “label filters”to do this as seen below.

In “1-B ” section; click “Add expression” > select “Classic condition” in Math as seen below.

Enter “last” to “when”, “of” to “A”, “20” to “is above”as seen below. This is, The condition will run when the final state of “A” is higher than “20” as seen below.

Note: If we want, we can add a second condition here as shown below. However, we will continue using a single condition for now.

We set “Evaluate every: “1m” for: “1m”” Thus, it will check every minute to start the alarm and if the condition continues for one minute it will start the alarm as seen below.

Give a name to the Rule name as “pod number”.

Important note: In “Folder”, enter a name like “kube-pods”, then hit to “enter”, then select “kube-pods” again. Otherwise, the alarm rule will not occur in the following steps and will give an error.

Enter a name to “Group” as “pods” as seen below.

The Notification section gives us information, we leave this section as default as seen below.

Click on the “Save rule” button above.

Our Alert rule was created under Alerting > Alert rules > Grafana as seen below.

When we click on the “view” button in the picture above, the following window opens showing us how many pods are running. We have 12 pods running as seen below.

Now, when we run any Kubernetes deployment yaml file with a “replicas” value of “15" as seen in the picture below, the alarm will run after waiting for a minute, because the number of pods exceeds “20”. “mydeployment.yml” file will be our dummy load here.

kubectl apply -f mydeployment.yml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 15
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80

When we started 15 pods, the Grafana alarm first went to “Pending” status, and a minute later it went to “Firing” status.

Likewise, we can see that the number of pods increases in Prometheus as seen below.

When we delete “mydeployment.yml” file, the alarm will stop after waiting for a minute, and the alarm status will be “normal”again.

kubectl delete -f mydeployment.yml

2. Setting Up An Alarm By Using “ConfigMap.yml” File and Examining it on the Prometheus Dashboard

If we wanted to set the alarms manually, we could also set them by entering the necessary values in the “ConfigMap.yml” file as shown below.

The shaded section in the picture below is the code form of the operation we performed manually in the Grafana dashboard above. Both are the same and serve the same function.

ConfigMap.yml file:

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-server-conf
  labels:
    name: prometheus-server-conf
  namespace: prometheus
data:
  rule.yml: |-
    groups:
    - name: sample alert
      rules:
      - alert: High Pod Meory
        expr: sum(container_memory_usage_bytes) > 1
        for: 1m
        labels:
          severity: slack
        annotations:
          summary: High Memory Usage
    - name: pod numbers
      rules:
      - alert: PrometheusPods
        expr: sum(kubelet_running_pods) > 15
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: Low number pods
  prometheus.yml: |-
    global:
      scrape_interval: 5s
      evaluation_interval: 5s
    rule_files:
      - /etc/prometheus/rule.yml
    alerting:
      alertmanagers:
      - scheme: http
        static_configs:
        - targets:
          - "alertmanager.prometheus.svc:9093"scrape_configs:
      - job_name: 'kubernetes-apiservers'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: default;kubernetes;https
      - job_name: 'kubernetes-nodes'
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
        - role: node
        relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
        - target_label: __address__
          replacement: kubernetes.default.svc:443
        - source_labels: [__meta_kubernetes_node_name]
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}/proxy/metrics
      
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
          action: replace
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod_name
      - job_name: 'kubernetes-cadvisor'
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
        - role: node
        relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
        - target_label: __address__
          replacement: kubernetes.default.svc:443
        - source_labels: [__meta_kubernetes_node_name]
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
      
      - job_name: 'kubernetes-service-endpoints'
        kubernetes_sd_configs:
        - role: endpoints
        relabel_configs:
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
          action: replace
          target_label: __scheme__
          regex: (https?)
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
          action: replace
          target_label: __address__
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2
        - action: labelmap
          regex: __meta_kubernetes_service_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_service_name]
          action: replace
          target_label: kubernetes_name

If we save the “ConfigMap.yml” file and if we run the Kubernetes deployment yaml file again, the alarm rule will run and appear on the Prometheus dashboard as seen below;

Note: We can also find ready-made alarm examples by searching the internet as seen below;

3. Installing Prometheus and Grafana by Using Helm Chart

Prerequisites:

Kubernetes 1.19+
Helm 3.7+

Then go to “artifacthub.io” by clicking the link below;

https://artifacthub.io/packages/helm/prometheus-community/prometheus

You can install Prometheus by using the following commands:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install [RELEASE_NAME] prometheus-community/prometheus

Find the Grafana page on “artifacthub.io”as seen below;

You can install Grafana by using the following commands:

$ helm repo add kube-ops https://charts.kube-ops.io
$ helm repo update
$ helm upgrade my-release kube-ops/grafana --install --namespace my-namespace --create-namespace --wait

Important note: If your Kubernetes cluster is not on a cloud provider (such as AWS EKS, Azure AKS, or GCP GKE), you have to make Prometheus volume assignments, otherwise Prometheus will not work as seen below. Or else, cloud providers make this volume assignment automatically, there is no problem with them.

Or you can bypass this error by setting the “PersistentVolume” values to “false” from “true” in the Values.yaml file of Helm Chart.

4. As a result

In this article, We ran Prometheus and Grafana together, collected metrics of the Kubernetes cluster with them, and then set up alarms using these metrics. We viewed and examined these metrics and alarms on the dashboards of Prometheus and Grafana. Also, we learned to install Prometheus and Grafana by using Helm Chart.

If you liked the article, I would be happy if you click on the Medium Following button to encourage me to write and not miss future articles.

Your clap, follow, or subscribe, they help my articles to reach the broader audience. Thank you in advance for them.

For more info and questions, don’t hesitate to get in touch with me on Linkedin or Medium.

5. Next post

In the next post, “Working with Microservices-19: Explanation of the Testing Stage, and Running a Unit Test and Configuring Code Coverage Report using Jacoco tool.”

We will use unit tests to check the lines, and functions in the source code. Then will create “code coverage report” using the Jacoco plugin in pom.xml file. Then we will install the Jacoco plugin on the Jenkins server too and we will run it. Finally, we will examine the code coverage report in the Jenkins pipeline.

6. References

(1) https://prometheus.io/docs/introduction/overview/

(2) https://grafana.com/docs/grafana/latest/getting-started/build-first-dashboard/

(3) https://artifacthub.io/packages/helm/prometheus-community/prometheus

(4) https://artifacthub.io/packages/helm/kube-ops/grafana