There is a common alert rule for OOM containers li...
# support
d
There is a common alert rule for OOM containers like this
Copy code
(kube_pod_container_status_restarts_total - kube_pod_container_status_restarts_total offset 1m >= 1) and ignoring (reason) min_over_time(kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}[1m]) == 1
this rule does not work on SigNoz promQL and I am not sure if it is even possible to create it with the query builder. any ideas?
n
Hey @Dimitris Mavrommatis Otel does provides metrics such as :
k8s.container.status.last_terminated_reason
,
k8s.container.restart_count
Ref: https://opentelemetry.io/docs/specs/semconv/resource/k8s/#container
Did you try with these metrics?
d
I find it very difficult to translate these queries from label-based
{reason="OOMKilled"}
to value-based
== "OOMKilled"
etc.
I was able to use the promQL query on SigNoz with
on(...)
instead of
ignoring(...)
because the metrics had more differences on the labels.
n
Can you share the query if it's possible, want to take look at ti
d
the query is the one at the start of the thread. I am trying to setup alerts based on https://samber.github.io/awesome-prometheus-alerts/rules.html#kubernetes I am using some node exporters so I have the same metrics available as prometheus because I couldn't figure out to set them up with the
k8s.*
metrics.
for example, it is very difficult for me to figure out how to do the PodUnhealthy alert on SigNoz. No matter what it always triggers even if I have threshold for all the times. https://github.com/SigNoz/signoz/issues/7883