Al
03/31/2025, 7:21 PMk8s_container_restarts
Temporal Aggregation: Latest
Spatial Aggregation: Max
fx: Running Diff
+ Cumulative Sum
When viewing on a dashboard, the results look accurate.
However, alerts fire randomly with strange values not observed in dashboards for the same period.
Send notification when A is above the threshold in total during the past 60 mins.
Alert Threshold: 3
Alert Description: Container restarting frequently - {{$value}} restarts in the last hour.
Alerts will arrive with the following:
• Container restarting frequently - 484 restarts in the last hour.
• Container restarting frequently - 1328 restarts in the last hour.
• Container restarting frequently - 600 restarts in the last hour.
• Container restarting frequently - 5689 restarts in the last hour.
Occasionally I will also see negative values.
Again viewing on a dashboard, the restart count is accurate, but it seems the alert firing calculations are incorrect.Srikanth Chekuri
04/01/2025, 1:48 PMAl
04/01/2025, 11:09 PMAl
04/02/2025, 5:06 PMTest Notification
and received Container restarting frequently - 806 restarts
Another observation: After clicking Test Notification
I received alerts for containers that did not restart during the time period, but they did have a non-zero k8s_container_restarts value.
Once container has zero restarts during the period, but did have k8s_container_restarts = 2. The notification had Container restarting frequently - 138 restarts
Srikanth Chekuri
04/03/2025, 9:32 AMAl
04/03/2025, 6:25 PMSELECT k8s_namespace_name,
k8s_container_name,
cluster_env,
ts,
max(per_series_value) as value
FROM (
SELECT fingerprint,
any(k8s_namespace_name) as k8s_namespace_name,
any(k8s_container_name) as k8s_container_name,
any(cluster_env) as cluster_env,
toStartOfInterval(
toDateTime(intDiv(unix_milli, 1000)),
INTERVAL 300 SECOND
) as ts,
anyLast(last) as per_series_value
FROM signoz_metrics.distributed_samples_v4_agg_5m
INNER JOIN (
SELECT DISTINCT JSONExtractString(labels, 'k8s_namespace_name') as k8s_namespace_name,
JSONExtractString(labels, 'k8s_container_name') as k8s_container_name,
JSONExtractString(labels, 'cluster_env') as cluster_env,
fingerprint
FROM signoz_metrics.time_series_v4_1day
WHERE metric_name IN ['k8s_container_restarts']
AND temporality = 'Unspecified'
AND __normalized = true
AND unix_milli >= 1743552000000
AND unix_milli < 1743704220000
AND JSONExtractString(labels, 'cluster_env') = 'prod'
AND JSONExtractString(labels, 'k8s_deployment_name') NOT IN ['metrics-server','kube-state-metrics']
) as filtered_time_series USING fingerprint
WHERE metric_name IN ['k8s_container_restarts']
AND unix_milli >= 1743617400000
AND unix_milli < 1743704220000
GROUP BY fingerprint,
ts
ORDER BY fingerprint,
ts
)
WHERE isNaN(per_series_value) = 0
GROUP BY k8s_namespace_name,
k8s_container_name,
cluster_env,
ts
ORDER BY k8s_namespace_name ASC,
k8s_container_name ASC,
cluster_env ASC,
ts ASC
@Srikanth Chekuri Please confirm that this is what you're asking for. Thanks!Al
04/03/2025, 6:33 PMSrikanth Chekuri
04/03/2025, 6:41 PMAl
04/03/2025, 6:48 PMSrikanth Chekuri
04/03/2025, 6:53 PMAl
04/03/2025, 6:57 PMAl
04/03/2025, 6:57 PMAl
04/03/2025, 7:06 PMSrikanth Chekuri
04/03/2025, 7:08 PMAl
04/03/2025, 7:09 PM