Hi there, I'm seeking assistance with configuring ...
# support
s
Hi there, I'm seeking assistance with configuring monitoring and alerts in Signoz. Here are the specific areas I need help with: 1) Alert on CrashLoopBack/OOMkilled pods & Additionally, I need assistance in configuring alerts to retrieve logs from previously terminated pods. 2) How can I establish alerts to notify when spot instances go down and are subsequently rescheduled. 3) How can I leverage Prometheus default metrics within Signoz to create alerts?"
n
@Prashant Shahi will be able to help you here.
s
@nitya-signoz Thanks
@Prashant Shahi, Could you please help me out with the above points.
p
1) Alert on CrashLoopBack/OOMkilled pods & Additionally, I need assistance in configuring alerts to retrieve logs from previously terminated pods.
If you know the errors/exceptions pattern, you can set up alert based on that, and easily view logs from the alert and log context itself. If you are do not have any error patterns or they are not always printed, you will have to opt for two-step solution. 1. You should be able to use
k8s.pod.phase
metrics to detect pod failure. Check this thread for query: https://signoz-community.slack.com/archives/C01HWQ1R0BC/p1710329083837019?thread_ts=1710327606.553629&cid=C01HWQ1R0BC 2. View logs of the pod based on the
k8s.pod.name
in the metrics alert
2) How can I establish alerts to notify when spot instances go down and are subsequently rescheduled.
You can use
absent(up{hostname="..."})
in the alert PromQL query. We have something equivalent recently shipped for Query-Builder as well, docs for the same should be out soon.
3) How can I leverage Prometheus default metrics within Signoz to create alerts?
Did you go through this docs? https://signoz.io/docs/userguide/alerts-management/