Dimitris Mavrommatis
05/14/2025, 12:41 PM(kube_pod_container_status_restarts_total - kube_pod_container_status_restarts_total offset 1m >= 1) and ignoring (reason) min_over_time(kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}[1m]) == 1
this rule does not work on SigNoz promQL and I am not sure if it is even possible to create it with the query builder. any ideas?Nagesh Bansal
05/15/2025, 12:19 PMk8s.container.status.last_terminated_reason , k8s.container.restart_count
Ref: https://opentelemetry.io/docs/specs/semconv/resource/k8s/#containerNagesh Bansal
05/15/2025, 12:19 PMDimitris Mavrommatis
05/15/2025, 12:21 PM{reason="OOMKilled"} to value-based == "OOMKilled" etc.Dimitris Mavrommatis
05/15/2025, 12:22 PMon(...) instead of ignoring(...) because the metrics had more differences on the labels.Nagesh Bansal
05/15/2025, 12:29 PMDimitris Mavrommatis
05/15/2025, 1:11 PMk8s.* metrics.Dimitris Mavrommatis
05/15/2025, 1:12 PMDimitris Mavrommatis
05/15/2025, 11:04 PMall the times threshold check works? because my pod is unhealthy for 1m in an 5m period so the value is 1 and then it goes down to 0 but the alert still fires.
should it see that it was 1 only for 1m out of 5m and not fire? or does it not see 0 as a value?