HI guys, I am using SigNoz `v0.11.4` with Slack. I...
# support
a
HI guys, I am using SigNoz
v0.11.4
with Slack. I have an issue with alert manager, sometimes the alert is triggering with status
Firing
, but the alert never trigger a message to slack. On the other hand, another issue is when I create the alarm and I test it, it does not work, but if I saved and edit it and change the name for example and test it again the alarm send the message to the slack channel.
a
cc: @Amol Umbark
a
Hi @Alejandro Decchi are you able to consistently create the issue? if so please share exact steps, log of query service and alert manager and also see jf the alert shows up in triggered notifications
i will try to reproduce the test alert issue. and get back.
a
@Amol Umbark thanks for your reply. @Pranay create the issue https://github.com/SigNoz/signoz/issues/1986 I will try to add more details to it
I updated the issue
a
@Alejandro Decchi there was an issue in 0.11 where the channels were not getting into alert manager. in your log of alert manager i don't see channels.
let me check that issue exists in .11.4 can you check your channel configuration and test it
a
I have many channel added at signoz
Sometimes, to resolve the issue I have to recretae the channel
I am using
<http://docker.io/signoz/alertmanager:0.23.0-0.2|docker.io/signoz/alertmanager:0.23.0-0.2>
a
@Alejandro Decchi I have posted an update in the issue. please take a look https://github.com/SigNoz/signoz/issues/1986
I am unable to reproduce the issue. hence, need further inputs from you.
a
@Amol Umbark thank you for your feedback, I will review the git hub issue to give more details
I update the tickets 🙂
a
Hi @Alejandro Decchi I looked at your response. the wget result is unexpected. would you be able to get on a call to resolve this. I am in IST time zone. please share a suitable time for huddle
@Alejandro Decchi Also, is it possible to upgrade to v12 and try?
@Prashant Shahi will it be safe to delete query-service pod to re-create it? I am supsecting the new deployment did not re-create the pod. and an older version of query-service is active
p
@Amol Umbark @Alejandro Decchi yes, I have not seen any issues with restarting query-service pods.
a
@Alejandro Decchi can you please try deleting query service pod and capturing log as well after it starts. please share the log as well
a
@Amol Umbark I will try to delete it and I will sahre the Logs
Here some part of the Log output:
Copy code
2023-01-18T16:20:30.226Z	INFO	version/version.go:43	

SigNoz version   : v0.11.4
Commit SHA-1     : 8e55228
Commit timestamp : 2022-11-29T11:43:47Z
Branch           : HEAD
Go version       : go1.17.13

For SigNoz Official Documentation,  visit <https://signoz.io/docs>
For SigNoz Community Slack,         visit <http://signoz.io/slack>
For discussions about SigNoz,       visit <https://community.signoz.io>

Check SigNoz Github repo for license details.
Copyright 2022 SigNoz
2023-01-18T16:20:30.227Z	WARN	query-service/main.go:61	No JWT secret key is specified.
main.main
	/go/src/github.com/signoz/signoz/ee/query-service/main.go:61
runtime.main
	/usr/local/go/src/runtime/proc.go:255
2023-01-18T16:20:30.452Z	INFO	license/manager.go:124	No active license found, defaulting to basic plan
2023-01-18T16:20:30.452Z	INFO	app/server.go:100	Using ClickHouse as datastore ...
ts=2023-01-18T16:20:30.460035085Z caller=log.go:168 level=info msg="Loading configuration file" filename=/root/config/prometheus.yml
ts=2023-01-18T16:20:30.462181835Z caller=log.go:168 level=info msg="Completed loading of configuration file" filename=/root/config/prometheus.yml
2023-01-18T16:20:30.466Z	INFO	alertManager/notifier.go:94	Starting notifier with alert manager:[<http://signoz-alertmanager:9093/api/>]
2023-01-18T16:20:30.466Z	INFO	app/server.go:428	rules manager is ready
2023-01-18T16:20:30.468Z	DEBUG	rules/apiParams.go:85	postable rule(parsed):%!(EXTRA *rules.PostableRule=&{testing-error-rate   promql_rule 300000000000 0 {"compositeMetricQuery":{"builderQueries":{"A":{"queryName":"A","metricName":"","tagFilters":{"op":"AND","items":[]},"aggregateOperator":1,"expression":"A","disabled":false}},"promQueries":{"A":{"query":"(max(sum(rate(signoz_calls_total{service_name=\"api-ktor-template\", operation=~`HTTP GET|HTTP POST`, status_code=\"STATUS_CODE_ERROR\"}[5m]) OR rate(signoz_calls_total{service_name=\"api-ktor-template\", operation=~`HTTP GET|HTTP POST`, http_status_code=~\"5..\"}[5m]))*100/sum(rate(signoz_calls_total{service_name=\"api-ktor-template\", operation=~`HTTP GET|HTTP POST`}[5m]))) \u003c 1000 OR vector(0))","disabled":false}},"panelType":0,"queryType":3},"op":"1","target":5,"matchType":"1"} map[severity:critical] map[description:A new alert] false <https://signoz.stg.travelx.it/alerts/edit?ruleId=2> [testing-alarms]  })
2023-01-18T16:20:30.468Z	DEBUG	rules/apiParams.go:126	postable rule:%!(EXTRA *rules.PostableRule=&{testing-error-rate   promql_rule 300000000000 60000000000 {"compositeMetricQuery":{"builderQueries":{"A":{"queryName":"A","metricName":"","tagFilters":{"op":"AND","items":[]},"aggregateOperator":1,"expression":"A","disabled":false}},"promQueries":{"A":{"query":"(max(sum(rate(signoz_calls_total{service_name=\"api-ktor-template\", operation=~`HTTP GET|HTTP POST`, status_code=\"STATUS_CODE_ERROR\"}[5m]) OR rate(signoz_calls_total{service_name=\"api-ktor-template\", operation=~`HTTP GET|HTTP POST`, http_status_code=~\"5..\"}[5m]))*100/sum(rate(signoz_calls_total{service_name=\"api-ktor-template\", operation=~`HTTP GET|HTTP POST`}[5m]))) \u003c 1000 OR vector(0))","disabled":false}},"panelType":0,"queryType":3},"op":"1","target":5,"matchType":"1"} map[severity:critical] map[description:A new alert] false <https://signoz.stg.travelx.it/alerts/edit?ruleId=2> [testing-alarms]  }, string=	 condition, string={"compositeMetricQuery":{"builderQueries":{"A":{"queryName":"A","metricName":"","tagFilters":{"op":"AND","items":[]},"aggregateOperator":1,"expression":"A","disabled":false}},"promQueries":{"A":{"query":"(max(sum(rate(signoz_calls_total{service_name=\"api-ktor-template\", operation=~`HTTP GET|HTTP POST`, status_code=\"STATUS_CODE_ERROR\"}[5m]) OR rate(signoz_calls_total{service_name=\"api-ktor-template\", operation=~`HTTP GET|HTTP POST`, http_status_code=~\"5..\"}[5m]))*100/sum(rate(signoz_calls_total{service_name=\"api-ktor-template\", operation=~`HTTP GET|HTTP POST`}[5m]))) \u003c 1000 OR vector(0))","disabled":false}},"panelType":0,"queryType":3},"op":"1","target":5,"matchType":"1"})
2023-01-18T16:20:30.468Z	DEBUG	rules/manager.go:345	msg:%!(EXTRA string=adding a new rule task, string=	 task name:, string=2-groupname)
2023-01-18T16:20:30.468Z	INFO	rules/promRule.go:94	msg:creating new alerting rule	 name:testing-error-rate	 condition:{"compositeMetricQuery":{"builderQueries":{"A":{"queryName":"A","metricName":"","tagFilters":{"op":"AND","items":[]},"aggregateOperator":1,"expression":"A","disabled":false}},"promQueries":{"A":{"query":"(max(sum(rate(signoz_calls_total{service_name=\"api-ktor-template\", operation=~`HTTP GET|HTTP POST`, status_code=\"STATUS_CODE_ERROR\"}[5m]) OR rate(signoz_calls_total{service_name=\"api-ktor-template\", operation=~`HTTP GET|HTTP POST`, http_status_code=~\"5..\"}[5m]))*100/sum(rate(signoz_calls_total{service_name=\"api-ktor-template\", operation=~`HTTP GET|HTTP POST`}[5m]))) \u003c 1000 OR vector(0))","disabled":false}},"panelType":0,"queryType":3},"op":"1","target":5,"matchType":"1"}	 query:(max(sum(rate(signoz_calls_total{service_name="api-ktor-template", operation=~`HTTP GET|HTTP POST`, status_code="STATUS_CODE_ERROR"}[5m]) OR rate(signoz_calls_total{service_name="api-ktor-template", operation=~`HTTP GET|HTTP POST`, http_status_code=~"5.."}[5m]))*100/sum(rate(signoz_calls_total{service_name="api-ktor-template", operation=~`HTTP GET|HTTP POST`}[5m]))) < 1000 OR vector(0)) > 5.000000
2023-01-18T16:20:30.468Z	INFO	rules/promRuleTask.go:42	Initiating a new rule group:2-groupname	 frequency:1m0s
a
@Alejandro Decchi thanks for sharing the log. did the recreating pod fix the issue?
If not is there a good time we can get on a huddle. I am in IST
a
It did not work. I am at GMT-300. When you are available ?
any feedback ?
a
hey .. can you try a fresh install of 0.11.4 in a separate environment. i suspect something from earlier versions is running and causing the issue
i will confirm if i can connect tomorrow (friday) at 7pm ist (same time as you sent your last message)
a
@Amol Umbark this issue happened in 2 different environment . If you want I am available nest Monday 30 at at 7pm (GMT-300)
a
hey sorry I was on leave. can we connect today?
Please ping me when you are online
btw, i am available in IST timezone (GMT+5:30) 7pm (GMT-3) is like 3.30am here. can you take a look at this timeline and share a suitable time for you.
I am available for next couple of hours. if we dont get to connect, please book a meeting from here. https://calendly.com/amol-umbark/30min?month=2023-02
@Alejandro Decchi I am able to reproduce the error in one of the help installations. will have more update tomorrow
a
@Amol Umbark perfect! It great to be able to reproduce this random error. I keep waiting your update
a
@Prashant Shahi has resolved the issue. we will be publishing a PR soon. If possible, we will suggest a point fix so you can do it in your env and carry on
a
@Amol Umbark can you share the image/tag to deploy in my enviornment ?
a
@Alejandro Decchi the fix is in helm chart. so you would have to update helm chart and reinstall @Prashant Shahi can you please post here once the chart is updated.
p
@Alejandro Decchi The fix is merged and out. Follow our docs to upgrade to latest chart release: https://signoz.io/docs/operate/kubernetes/#upgrade-signoz-cluster Be sure to include
-f override-values.yaml
if you had passed custom values during installation.
a
@Amol Umbark @Prashant Shahi so it is fixed at chart version 0.10.2 that was released 2 hours ago ?
p
@Alejandro Decchi yes, it is
a
Perfect I will try it at Dev
p
@Alejandro Decchi Okay, do let us know if the issue persists or you face any issues upgrading.
a
thanks @Prashant Shahi
p
happy to help 🙂