Working with Microservices-18: Setting Up An Alarm By Using the Grafana Dashboard and Prometheus ConfigMap.yml

We will run Prometheus and Grafana together, collect metrics of the Kubernetes cluster with them, and then set up alarms using these metrics. We will set up the alarms (alarm rules) by using both the “Grafana” and “Prometheus ConfigMap.yml” to monitor the Kubernetes cluster. We will view and examine the alarms on the dashboards of Prometheus and Grafana. Also, we will install Prometheus and Grafana by using Helm Chart. We will do them practically and step by step.

Cumhur Akkaya
9 min readFeb 19, 2024

Topics we will cover:

Previous article:
1. Changing The Ports and Services of Prometheus and Grafana
2. Configuring the Security Group of Cluster Nodes
3. Running Prometheus
4. Running Grafana and Connecting to Prometheus
5. As a result
6. Previous article: “Working with Microservices-18: Setting Up An Alarm By Using the Grafana Dashboard and Prometheus ConfigMap.yml
7. References

This article:
1. Setting Up An Alarm By Using the Grafana and Examining it on the Grafana Dashboard
2. Setting Up An Alarm By Using “ConfigMap.yml” File and Examining it on the Prometheus Dashboard
3. Installing Prometheus and Grafana by Using Helm Chart
4. As a result
5. Next post: “Working with Microservices-19: Explanation of the Testing Stage, and Running a Unit Test and Configuring Code Coverage Report using Jacoco tool.
6. References

If you like the article, I will be happy if you click the Medium Follow button to encourage me to write more, and not miss future articles.

Your clapping, following, or subscribing helps my articles to reach a broader audience. Thank you in advance for them.

In the previous articles, we ran Prometheus and Grafana together, then collected metrics of the Kubernetes cluster with them. We viewed and examined these metrics on the dashboards of Prometheus and Grafana.

Now, we continue from where we left off.

1. Setting Up An Alarm By Using the Grafana and Examining it on the Grafana Dashboard

Go to Home > Alerting > Alert rules and click on the “ New alert rule” button as seen below;

In a new window that opens, select “Grafana managed alert” as seen below;

Select a metric “1-A ” section, You can select “now-15m to now” , metric only “kublet_running_pods”, Operations > Aggregations “Sum” as seen below. Then click on the “ Run queries” button and try the rule.

Note: With the above settings, we see pods running on all nodes. If we wanted to see only the pods running on the Worker-1 node, we would need to set the metric value as follows. We use “label filters”to do this as seen below.

In “1-B ” section; click “Add expression” > select “Classic condition” in Math as seen below.

Enter “last” to “when”, “of” to “A”, “20” to “is above”as seen below. This is, The condition will run when the final state of “A” is higher than “20” as seen below.

Note: If we want, we can add a second condition here as shown below. However, we will continue using a single condition for now.

We set “Evaluate every: “1m” for: “1m”” Thus, it will check every minute to start the alarm and if the condition continues for one minute it will start the alarm as seen below.

Give a name to the Rule name as “pod number”.

Important note: In “Folder”, enter a name like “kube-pods”, then hit to “enter”, then select “kube-pods” again. Otherwise, the alarm rule will not occur in the following steps and will give an error.

Enter a name to “Group” as “pods” as seen below.

The Notification section gives us information, we leave this section as default as seen below.

Click on the “Save rule” button above.

Our Alert rule was created under Alerting > Alert rules > Grafana as seen below.

When we click on the “view” button in the picture above, the following window opens showing us how many pods are running. We have 12 pods running as seen below.

Now, when we run any Kubernetes deployment yaml file with a “replicas” value of “15" as seen in the picture below, the alarm will run after waiting for a minute, because the number of pods exceeds “20”. “mydeployment.yml” file will be our dummy load here.

kubectl apply -f mydeployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 15
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80

When we started 15 pods, the Grafana alarm first went to “Pending” status, and a minute later it went to “Firing” status.

Likewise, we can see that the number of pods increases in Prometheus as seen below.

When we delete “mydeployment.yml” file, the alarm will stop after waiting for a minute, and the alarm status will be “normal”again.

kubectl delete -f mydeployment.yml

2. Setting Up An Alarm By Using “ConfigMap.yml” File and Examining it on the Prometheus Dashboard

If we wanted to set the alarms manually, we could also set them by entering the necessary values in the “ConfigMap.yml” file as shown below.

The shaded section in the picture below is the code form of the operation we performed manually in the Grafana dashboard above. Both are the same and serve the same function.

ConfigMap.yml file:

apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-server-conf
labels:
name: prometheus-server-conf
namespace: prometheus
data:
rule.yml: |-
groups:
- name: sample alert
rules:
- alert: High Pod Meory
expr: sum(container_memory_usage_bytes) > 1
for: 1m
labels:
severity: slack
annotations:
summary: High Memory Usage
- name: pod numbers
rules:
- alert: PrometheusPods
expr: sum(kubelet_running_pods) > 15
for: 0m
labels:
severity: warning
annotations:
summary: Low number pods
prometheus.yml: |-
global:
scrape_interval: 5s
evaluation_interval: 5s
rule_files:
- /etc/prometheus/rule.yml
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "alertmanager.prometheus.svc:9093"scrape_configs:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-nodes'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics

- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- job_name: 'kubernetes-cadvisor'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name

If we save the “ConfigMap.yml” file and if we run the Kubernetes deployment yaml file again, the alarm rule will run and appear on the Prometheus dashboard as seen below;

Note: We can also find ready-made alarm examples by searching the internet as seen below;

3. Installing Prometheus and Grafana by Using Helm Chart

Prerequisites:

  • Kubernetes 1.19+
  • Helm 3.7+

Then go to “artifacthub.io” by clicking the link below;

https://artifacthub.io/packages/helm/prometheus-community/prometheus

You can install Prometheus by using the following commands:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install [RELEASE_NAME] prometheus-community/prometheus

Find the Grafana page on “artifacthub.io”as seen below;

You can install Grafana by using the following commands:

$ helm repo add kube-ops https://charts.kube-ops.io
$ helm repo update
$ helm upgrade my-release kube-ops/grafana --install --namespace my-namespace --create-namespace --wait

Important note: If your Kubernetes cluster is not on a cloud provider (such as AWS EKS, Azure AKS, or GCP GKE), you have to make Prometheus volume assignments, otherwise Prometheus will not work as seen below. Or else, cloud providers make this volume assignment automatically, there is no problem with them.

Or you can bypass this error by setting the “PersistentVolume” values to “false” from “true” in the Values.yaml file of Helm Chart.

4. As a result

In this article, We ran Prometheus and Grafana together, collected metrics of the Kubernetes cluster with them, and then set up alarms using these metrics. We viewed and examined these metrics and alarms on the dashboards of Prometheus and Grafana. Also, we learned to install Prometheus and Grafana by using Helm Chart.

If you liked the article, I would be happy if you click on the Medium Following button to encourage me to write and not miss future articles.

Your clap, follow, or subscribe, they help my articles to reach the broader audience. Thank you in advance for them.

For more info and questions, don’t hesitate to get in touch with me on Linkedin or Medium.

5. Next post

In the next post, “Working with Microservices-19: Explanation of the Testing Stage, and Running a Unit Test and Configuring Code Coverage Report using Jacoco tool.”

We will use unit tests to check the lines, and functions in the source code. Then will create “code coverage report” using the Jacoco plugin in pom.xml file. Then we will install the Jacoco plugin on the Jenkins server too and we will run it. Finally, we will examine the code coverage report in the Jenkins pipeline.

--

--

Cumhur Akkaya

✦ DevOps/Cloud Engineer, ✦ Believes in learning by doing, ✦ Dedication To Lifelong Learning, ✦ Tea and Coffee Drinker. ✦ Linkedin: linkedin.com/in/cumhurakkaya