saurabh biramwar
03/15/2024, 5:27 PMnitya-signoz
03/16/2024, 2:45 AMsaurabh biramwar
03/16/2024, 3:32 AMsaurabh biramwar
03/16/2024, 3:38 AMPrashant Shahi
03/18/2024, 5:33 AM1) Alert on CrashLoopBack/OOMkilled pods & Additionally, I need assistance in configuring alerts to retrieve logs from previously terminated pods.If you know the errors/exceptions pattern, you can set up alert based on that, and easily view logs from the alert and log context itself. If you are do not have any error patterns or they are not always printed, you will have to opt for two-step solution. 1. You should be able to use
k8s.pod.phase
metrics to detect pod failure. Check this thread for query: https://signoz-community.slack.com/archives/C01HWQ1R0BC/p1710329083837019?thread_ts=1710327606.553629&cid=C01HWQ1R0BC
2. View logs of the pod based on the k8s.pod.name
in the metrics alertPrashant Shahi
03/18/2024, 5:35 AM2) How can I establish alerts to notify when spot instances go down and are subsequently rescheduled.You can use
absent(up{hostname="..."})
in the alert PromQL query.
We have something equivalent recently shipped for Query-Builder as well, docs for the same should be out soon.Prashant Shahi
03/18/2024, 5:36 AM3) How can I leverage Prometheus default metrics within Signoz to create alerts?Did you go through this docs? https://signoz.io/docs/userguide/alerts-management/