Dimitris Mavrommatis
05/14/2025, 12:41 PM(kube_pod_container_status_restarts_total - kube_pod_container_status_restarts_total offset 1m >= 1) and ignoring (reason) min_over_time(kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}[1m]) == 1
this rule does not work on SigNoz promQL and I am not sure if it is even possible to create it with the query builder. any ideas?Nagesh Bansal
05/15/2025, 12:19 PMk8s.container.status.last_terminated_reason
, k8s.container.restart_count
Ref: https://opentelemetry.io/docs/specs/semconv/resource/k8s/#containerNagesh Bansal
05/15/2025, 12:19 PMDimitris Mavrommatis
05/15/2025, 12:21 PM{reason="OOMKilled"}
to value-based == "OOMKilled"
etc.Dimitris Mavrommatis
05/15/2025, 12:22 PMon(...)
instead of ignoring(...)
because the metrics had more differences on the labels.Nagesh Bansal
05/15/2025, 12:29 PMDimitris Mavrommatis
05/15/2025, 1:11 PMk8s.*
metrics.Dimitris Mavrommatis
05/15/2025, 1:12 PMDimitris Mavrommatis
05/15/2025, 11:04 PMall the times
threshold check works? because my pod is unhealthy for 1m in an 5m period so the value is 1 and then it goes down to 0 but the alert still fires.
should it see that it was 1 only for 1m out of 5m and not fire? or does it not see 0 as a value?