Hi <@U01HWQ1RTC2>, we are trying to fetch the mong...
# support
a
Hi @Ankit Nayan, we are trying to fetch the mongodb traces from a node js application request. but we are unable to get that info from node js http request trace. is there any way to get traces of mongodb ? We are using mongoose for the connecting with db. is it causing traces not to visible on signoz ?
a
Hi @Anil Kumar Bandrapalli, are you using this package for instrumenation: https://www.npmjs.com/package/@opentelemetry/auto-instrumentations-node This should take care of MongoDB tracing.
a
we are using that same package. but not able to receive any traces regarding mongodb. this is code snippet
Copy code
const sdk = new opentelemetry.NodeSDK({
  traceExporter,
  instrumentations: [getNodeAutoInstrumentations(), new MongoDBInstrumentation({
    enhancedDatabaseReporting: true,
  }),],
});
a
@Anil Kumar Bandrapalli tracing should be started as the first thing during your application start. Try importing below as the first import
Copy code
import sdk from './tracing';
Also, are you using MongoDB version
>=3.3 <4
? Found above relevant issue https://github.com/open-telemetry/opentelemetry-js-contrib/issues/683
we not change anything in exporter because we are exporting otlp
so should not export to jaeger
I see mongoose otel package, so instrumenting mongoose should be supported https://www.npmjs.com/package/opentelemetry-instrumentation-mongoose
a
oh ok i will try to use that package then
@Ankit Nayan, i facing one issue. in the previous version we are able to receive the slack alerts. in the new version we are not able to receive the slack alerts. in the signoz ui showing firing status and then after some time the status is set to ok. but we are not able to receive the alerts. Even in the alerts channel tab, we clicked on Test button , and we are receiving the alerts. I dont know the reason. Could you please help me out on this
and do we trace these alerts failed or success?
any update ?
@Ankit Nayan i have done integrating with mongoose . we are able to see mongodb traces
@Ankit Nayan can you me help me why alerts are not receiving from signoz. testchannel working fine only
a
@Amol Umbark can you please have a look when you get time
a
hi @Anil Kumar Bandrapalli can you please share log of query service and alert manager when the alert is in firing state and goes to OK after that.
a
ok
these are query serivce logs 2022-08-09T155617.656Z INFO app/server.go:188 /api/v1/version timeTaken: 180.387µs 2022-08-09T155618.948Z INFO app/server.go:188 /api/v1/version timeTaken: 321.815µs 2022-08-09T155619.817Z INFO app/server.go:188 /api/v1/rules timeTaken: 521.305µs 2022-08-09T155627.657Z INFO app/server.go:188 /api/v1/version timeTaken: 236.715µs 2022-08-09T155628.947Z INFO app/server.go:188 /api/v1/version timeTaken: 201.875µs 2022-08-09T155636.495Z INFO rules/promRuleTask.go:315 promql rule task:4-groupname eval started at:2022-08-09 155636.494261596 +0000 UTC 2022-08-09T155636.495Z INFO rules/promRule.go:307 rule:High Transaction Time Alert slack evaluating promql query: histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="POST /api/task/complete"}[1m])) by (le)) > 50.000000 time="2022-08-09T155636Z" level=warning msg="Ignoring hint {StepMs:0 Func:rate StartMs:1660060536494 EndMs:1660060596494} for query [1660060536494,1660060596494,{service_name=\"workflow-service\",operation=\"POST /api/task/complete\",name=\"signoz_latency_bucket\"}]." component=clickhouse 2022-08-09T155637.656Z INFO app/server.go:188 /api/v1/version timeTaken: 186.124µs 2022-08-09T155638.947Z INFO app/server.go:188 /api/v1/version timeTaken: 171.214µs 2022-08-09T155647.656Z INFO app/server.go:188 /api/v1/version timeTaken: 195.136µs 2022-08-09T155648.855Z INFO app/server.go:188 /api/v1/rules timeTaken: 522.895µs 2022-08-09T155648.947Z INFO app/server.go:188 /api/v1/version timeTaken: 272.605µs 2022-08-09T155657.657Z INFO app/server.go:188 /api/v1/version timeTaken: 217.998µs 2022-08-09T155658.947Z INFO app/server.go:188 /api/v1/version timeTaken: 198.955µs 2022-08-09T155707.656Z INFO app/server.go:188 /api/v1/version timeTaken: 200.184µs 2022-08-09T155708.948Z INFO app/server.go:188 /api/v1/version timeTaken: 278.13µs 2022-08-09T155717.656Z INFO app/server.go:188 /api/v1/version timeTaken: 220.669µs 2022-08-09T155718.860Z INFO app/server.go:188 /api/v1/rules timeTaken: 584.285µs 2022-08-09T155718.948Z INFO app/server.go:188 /api/v1/version timeTaken: 159.896µs 2022-08-09T155727.656Z INFO app/server.go:188 /api/v1/version timeTaken: 203.414µs 2022-08-09T155728.947Z INFO app/server.go:188 /api/v1/version timeTaken: 201.721µs 2022-08-09T155736.495Z INFO rules/promRuleTask.go:315 promql rule task:4-groupname eval started at:2022-08-09 155736.494261596 +0000 UTC 2022-08-09T155736.495Z INFO rules/promRule.go:307 rule:High Transaction Time Alert slack evaluating promql query: histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="POST /api/task/complete"}[1m])) by (le)) > 50.000000 time="2022-08-09T155736Z" level=warning msg="Ignoring hint {StepMs:0 Func:rate StartMs:1660060596494 EndMs:1660060656494} for query [1660060596494,1660060656494,{service_name=\"workflow-service\",operation=\"POST /api/task/complete\",name=\"signoz_latency_bucket\"}]." component=clickhouse 2022-08-09T155737.656Z INFO app/server.go:188 /api/v1/version timeTaken: 200.241µs 2022-08-09T155738.947Z INFO app/server.go:188 /api/v1/version timeTaken: 196.769µs 2022-08-09T155747.657Z INFO app/server.go:188 /api/v1/version timeTaken: 203.496µs 2022-08-09T155748.862Z INFO app/server.go:188 /api/v1/rules timeTaken: 528.558µs 2022-08-09T155748.948Z INFO app/server.go:188 /api/v1/version timeTaken: 186.908µs 2022-08-09T155757.656Z INFO app/server.go:188 /api/v1/version timeTaken: 197.801µs 2022-08-09T155758.948Z INFO app/server.go:188 /api/v1/version timeTaken: 200.078µs 2022-08-09T155807.656Z INFO app/server.go:188 /api/v1/version timeTaken: 168.084µs 2022-08-09T155808.947Z INFO app/server.go:188 /api/v1/version timeTaken: 173.779µs 2022-08-09T155817.656Z INFO app/server.go:188 /api/v1/version timeTaken: 211.835µs
And dont see any thing from alertmanager
logs
a
ok i don't see any alert firing. did you notice a firing status in the page triggered alerts
a
yes Amol
a
is it likely that the condition stayed for just under one minute. i can see that the formula is looking at last 1 min. there is a send delay for sometime (5 minute i think) in alert manager so if the condition resolved in that timeframe you may not receive. can you pls try a more broader condition here and check. may be increas the timeframe to 5 or 10 mins and reduce threshold .
a
sure
a
@Anil Kumar Bandrapalli
1m
interval does not work. Please confirm that by seeing the chart. You probably won't see a chart on alerts page if you choose
1m
as your time range
a
@Amol Umbark, this is my promql query
Copy code
histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="POST /api/task/complete"}[10m])) by (le))
When i create an alert with the above query, in logs i see that frequency set to 1m0s. so every minute i am seeing that above query is being evaluated and UI showing firing status in triggered alerts tab
@Ankit Nayan any suggestions ?
Also found this log for simple cpu utilization msg: skipping send alert due to resend delay%!(EXTRA string= rule: , string=High Transaction Time Alert, string= alert:, labels.Labels={alertname="High Transaction Time Alert", fingerprint="10661184619665832956", fullLabels="{\"__name__\":\"system_cpu_load_average_1m\"}", ruleId="9", ruleSource="https://signoz.accionbreeze.com/alerts/new", severity="warning"})
even i set the time for 5m . it is trying to calculate for every minute and sending alerts but not able to receive alerts in slack
a
resend delay occurs when an alert was already sent.
a
how to do resolve this issue ?
a
the alert frequency is one minute so the query will run every minute. the time range can vary
a
but how to increase this frequency ?
a
how will it help
are you not receiving any alerts or just one rule has a problem
a
i have written one simple alert even though i didn't receive any alerts in slack.
a
but test notifications option on the alerts page works right?
a
yes
a
let me get back
a
if you see the sequence of the logs in alert manager first it is finding alerts , and firing but after few seconds we are receiving skipping send alert message. So how to resolve this ?
FYI these are the sequence of logs 2022-08-10T130941.186Z DEBUG rules/ruleTask.go:331 msg:%!(EXTRA string=rule task eval started, string= name:, string=9-groupname, string= start time:, time.Time=2022-08-10 130941.184063358 +0000 UTC) 2022-08-10T130941.186Z DEBUG rules/thresholdRule.go:505 ruleid:%!(EXTRA string=9, string= runQueries:, map[string]string=map[A:SELECT fingerprint, labels as fullLabels, toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), INTERVAL 30 SECOND) as ts, any(value) as value FROM signoz_metrics.samples_v2 INNER JOIN (SELECT labels, fingerprint FROM signoz_metrics.time_series_v2 WHERE metric_name = 'system_cpu_load_average_1m' AND labels_object.name IN ['system_cpu_load_average_1m']) as filtered_time_series USING fingerprint WHERE metric_name = 'system_cpu_load_average_1m' AND timestamp_ms >= 1660136681184 AND timestamp_ms <= 1660136981184 GROUP BY fingerprint, labels, ts ORDER BY fingerprint, labels, ts]) 2022-08-10T130941.186Z DEBUG rules/thresholdRule.go:523 ruleId: %!(EXTRA string=9, string= result query label:, string=A) 2022-08-10T130941.217Z INFO rules/thresholdRule.go:619 rule:High Transaction Time Alert alerts found: 1 2022-08-10T130941.217Z INFO rules/thresholdRule.go:283 msg:initiating send alerts (if any) rule:High Transaction Time Alert 2022-08-10T130941.217Z DEBUG rules/thresholdRule.go:297 msg: skipping send alert due to resend delay%!(EXTRA string= rule: , string=High Transaction Time Alert, string= alert:, labels.Labels={alertname="High Transaction Time Alert", fingerprint="10661184619665832956", fullLabels="{\"__name__\":\"system_cpu_load_average_1m\"}", ruleId="9", ruleSource="https://signoz.accionbreeze.com/alerts/new", severity="warning"}) 2022-08-10T130946.629Z INFO app/server.go:188 /api/v1/rules timeTaken: 1.10824ms 2022-08-10T130947.656Z INFO app/server.go:188 /api/v1/version timeTaken: 251.177µs 2022-08-10T130948.947Z INFO app/server.go:188 /api/v1/version timeTaken: 222.816µs 2022-08-10T130957.656Z INFO app/server.go:188 /api/v1/version timeTaken: 202.774µs 2022-08-10T130958.947Z INFO app/server.go:188 /api/v1/version timeTaken: 195.425µs 2022-08-10T131007.656Z INFO app/server.go:188 /api/v1/version timeTaken: 181.432µs 2022-08-10T131008.947Z INFO app/server.go:188 /api/v1/version timeTaken: 194.131µs 2022-08-10T131016.644Z INFO app/server.go:188 /api/v1/rules timeTaken: 1.204984ms 2022-08-10T131017.656Z INFO app/server.go:188 /api/v1/version timeTaken: 188.567µs 2022-08-10T131018.950Z INFO app/server.go:188 /api/v1/version timeTaken: 1.27064ms 2022-08-10T131027.656Z INFO app/server.go:188 /api/v1/version timeTaken: 179.987µs 2022-08-10T131028.948Z INFO app/server.go:188 /api/v1/version timeTaken: 237.169µs
a
@Anil Kumar Bandrapalli i am trying to reproduce the issue. will update you
a
Ok thank you
a
Hi @Anil Kumar Bandrapalli I could not reproduce the issue. But the error that you see “skipping resend…” occurs when a message is already sent. So it is strange that you never received any message in slack. Another possibility is the call to alert manager api has failed. To test this, can you disable all rules except one. Restart the services (query service). and then monitor the log. As soon as the service starts, the rules will be started and you can see if alerts found condition appears in the log. If so you can also see if api call to alert manager failed in the log.
a
i will give a try amol
Hi @Amol Umbark / @Ankit Nayan, before your reply i just downgraded signoz to 9. we are getting different issue. this is the below error Received Server Error response for API call to alertmanager(POST http://my-release-signoz-alertmanager:9093/api/v1/testReceiver)\n%!(EXTRA string=404 Not Found)" Do you have any insights on this error
?
Even after reverting to version 10 we are getting the same issue
yes right now i am in version 10 only. but still getting this error
Received Server Error response for API call to alertmanager(POST http://my-release-signoz-alertmanager:9093/api/v1/testReceiver)\n%!(EXTRA string=404 Not Found)"
a
got it .. the API host is different
are you setting ALERTMANAGER_API_PREFIX in env var?
can you try accessing http://my-release-signoz-alertmanager:9093 does it take you to alert manager dashboard?
a
from this host the test alert was fired https://signoz.accionbreeze.com/api/v1/testChannel
internally it is calling that my-release-signoz-alertmanager
r
but where we can check the value for ALERTMANAGER_API_PREFIX, i already login into alertmanager pod but unable to find any env variable with name ALERTMANAGER_API_PREFIX
a
so the way this works is: • alert manager APi is derived as http://alertmanager:9093/api/ • the docker compose assigns the name alertmanager to alert manager pod. • hence query service can access it within network I am wondering why in your case the URL of alert manager is different. @Prashant Shahi any thoughts?
a
for your information we are deploying via kubernetes
a
yeh i understand that
a
any update on or some kind solution ?
r
@Amol Umbark and @Prashant Shahi i changed the env vars of ALERTMANAGER_API_PREFIX to http://alertmanager:9093/api/ but still we are getting the same error msg. 2022-08-12T054009.919Z ERROR alertManager/manager.go:164 Error in getting response of API call to alertmanager(POST http://alertmanager:9093/api/v1/testReceiver) %!(EXTRA *url.Error=Post "http://alertmanager:9093/api/v1/testReceiver": dial tcp: lookup alertmanager on 172.20.0.1053 no such host) go.signoz.io/query-service/integrations/alertManager.(*manager).TestReceiver
p
If you are followed docs for installation, approriate endpoint would be
<http://my-release-signoz-alertmanager:9093>
And moreover that is set automatically by Helm chart
r
ok but @Amol Umbark old us that http://my-release-signoz-alertmanager:9093 is not the correct API http://alertmanager:9093 is the correct API, we tried with both the but error msg is the same.
p
In case of Docker, we don't set any env since code defaults to
alertmanager
In case of K8s, the default env should be correct
r
am using the kubernetes way of installation
p
Could you restart the alertmanager and queryservice pod?
Copy code
kubectl get svc -n platform
Can you share output of the command above?
r
[ec2-user@ip-10-0-4-191 ~]$ k get svc -n platform NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE chi-signoz-cluster-0-0 ClusterIP None <none> 8123/TCP,9000/TCP,9009/TCP 19h clickhouse-operator-metrics ClusterIP 172.20.22.194 <none> 8888/TCP 19h my-release-clickhouse ClusterIP 172.20.25.33 <none> 8123/TCP,9000/TCP 19h my-release-signoz-alertmanager ClusterIP 172.20.128.93 <none> 9093/TCP 19h my-release-signoz-alertmanager-headless ClusterIP None <none> 9093/TCP 19h my-release-signoz-frontend NodePort 172.20.174.179 <none> 3301:31596/TCP 19h my-release-signoz-otel-collector ClusterIP 172.20.195.124 <none> 14250/TCP,14268/TCP,4317/TCP,4318/TCP 19h my-release-signoz-otel-collector-metrics ClusterIP 172.20.51.23 <none> 13133/TCP 19h my-release-signoz-query-service ClusterIP 172.20.82.146 <none> 8080/TCP,8085/TCP 19h my-release-zookeeper ClusterIP 172.20.226.187 <none> 2181/TCP,2888/TCP,3888/TCP 19h my-release-zookeeper-headless ClusterIP None <none> 2181/TCP,2888/TCP,3888/TCP 19h [ec2-user@ip-10-0-4-191 ~]$
p
Yup.. the alertmanager endpoint is correct
It could of one the two things: • Alertmanager pod is unhealthy • Frontend nginx or query-service is unable to resolve to correct address of alertmanager pod
Copy code
kubectl get pods -n platform
r
[ec2-user@ip-10-0-4-191 ~]$ kubectl get pods -n platform NAME READY STATUS RESTARTS AGE chi-signoz-cluster-0-0-0 1/1 Running 0 19h clickhouse-operator-598444b99-qbnbj 2/2 Running 0 19h my-release-signoz-alertmanager-0 1/1 Running 0 38m my-release-signoz-frontend-584d85596b-648lh 1/1 Running 0 19h my-release-signoz-otel-collector-68cd55f8-xgtgl 1/1 Running 0 19h my-release-signoz-otel-collector-metrics-6789c89544-nrzvr 1/1 Running 0 19h my-release-signoz-query-service-0 1/1 Running 0 37m my-release-zookeeper-0 1/1 Running 0 19h [ec2-user@ip-10-0-4-191 ~]$
p
Just tested and I was able to reproduce this error
Let me look into it and get back to you
r
ok
a
ok thank you.
p
Only test alert channel endpoint seems to be broken. Configured alerts work as expected.
cc @Amol Umbark
a
ok let me take a look
a
Also @Prashant Shahi , we configured alerts are not working . in the ui showing firing and again status setting to OK. but we are not able to receive the alerts to slack
we are getting this message msg: skipping send alert due to resend delay%!(EXTRA string= rule: , string=High Transaction Time Alert, string= alert:, labels.Labels={alertname="High Transaction Time Alert", ruleId="1",
p
I tested with webhook using https://webhook.site/
a
we set the alert for "system_cpu_load_average_15m".
but we configured for my own slack channel . if you need it i will provide you the details as well
p
got it. let me check with slack alert and get back
a
@Anil Kumar Bandrapalli can you pls review the log prior to this message. the message appears only after the first sent message. the message also would not occur first time when you restart the query service .. so if you restart you should see that alert has run and no of alerts found
r
2022-08-12T065623.742Z INFO rules/thresholdRule.go:619 rule:High Transaction Time Alert alerts found: 1 2022-08-12T065623.742Z INFO rules/thresholdRule.go:283 msg:initiating send alerts (if any) rule:High Transaction Time Alert 2022-08-12T065623.742Z DEBUG rules/thresholdRule.go:297 msg: skipping send alert due to resend delay%!(EXTRA string= rule: , string=High Transaction Time Alert, string= alert:, labels.Labels={alertname="High Transaction Time Alert", ruleId="1", ruleSource="https://signoz.accionbreeze.com/alerts/new", severity="warning"}) 2022-08-12T065625.568Z INFO app/server.go:188 /api/v1/version timeTaken: 230.346µs 2022-08-12T065631.007Z INFO app/server.go:188 /api/v1/version timeTaken: 401.041µs 2022-08-12T065635.568Z INFO app/server.go:188 /api/v1/version timeTaken: 201.353µs 2022-08-12T065641.008Z INFO app/server.go:188 /api/v1/version timeTaken: 689.342µs 2022-08-12T065645.568Z INFO app/server.go:188 /api/v1/version timeTaken: 288.93µs 2022-08-12T065651.016Z INFO app/server.go:188 /api/v1/version timeTaken: 354.102µs 2022-08-12T065655.568Z INFO app/server.go:188 /api/v1/version timeTaken: 644.13µs 2022-08-12T065701.014Z INFO app/server.go:188 /api/v1/version timeTaken: 274.136µs 2022-08-12T065705.568Z INFO app/server.go:188 /api/v1/version timeTaken: 335.678µs 2022-08-12T065711.014Z INFO app/server.go:188 /api/v1/version timeTaken: 233.826µs 2022-08-12T065715.567Z INFO app/server.go:188 /api/v1/version timeTaken: 191.212µs 2022-08-12T065721.007Z INFO app/server.go:188 /api/v1/version timeTaken: 206.946µs 2022-08-12T065723.709Z DEBUG rules/ruleTask.go:331 msg:%!(EXTRA string=rule task eval started, string= name:, string=1-groupname, string= start time:, time.Time=2022-08-12 065723.708306749 +0000 UTC) 2022-08-12T065723.709Z DEBUG rules/thresholdRule.go:505 ruleid:%!(EXTRA string=1, string= runQueries:, map[string]string=map[A:SELECT name, toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), INTERVAL 30 SECOND) as ts, avg(value) as value FROM signoz_metrics.samples_v2 INNER JOIN (SELECT labels_object.name as name, fingerprint FROM signoz_metrics.time_series_v2 WHERE metric_name = 'system_cpu_load_average_15m') as filtered_time_series USING fingerprint WHERE metric_name = 'system_cpu_load_average_15m' AND timestamp_ms >= 1660287143708 AND timestamp_ms <= 1660287443708 GROUP BY name,ts ORDER BY name, ts])
a
is this the first alert execution after restart?
a
after restart we created this alert and shared the log with you. now i will do restart the alert manager and query service and will share the logs
@Amol Umbark, after restart also i have seen the same logs and alerts not received in slack
p
Slack alerts works as expected for me in
v0.10.2
. Could you share the screenshot of the alert?
a
@Anil Kumar Bandrapalli can you please upgrade to v0.10.2. This version has a enable/disable option .. we can try disabling and enabling to capture exact log
As i can see, you are on v0.10.0 which doesnt have this funcitonality of disabling or testing alert notification
a
@Prashant Shahi ALERT MEANS alert rule or alert channel page?
@Amol Umbark we have one alert only configured not like more alerts.
@Prashant Shahi, we are able to receive alerts properly in version 10.0 earlier. now only giving this error. The fact is we can't upgrade to version 10.2 as we already deployed version 10.0 to our client. Could you help us to resolve this issue
please
@Prashant Shahi /@Ankit Nayan /@Amol Umbark, kindly help us out
a
ok let me try to reproduce with 10.0
a
But not able to receive alerts in slack
a
do you see any alerts in triggered tab
next to alert rules you can see triggered alert tab . do you see alerts there or any error when you switch to the tab
a
yes in firing status only. no errors
a
send me screenshots ok
please
if alert shows there it means the alert has reached alert manager
can you also send alert manager log. restart it before you grab the alert
the triggered tab looks fine. will need alert manager log, restart it before grabbing the log pls
a
level=info ts=2022-08-12T100254.967Z caller=main.go:237 msg="Starting Alertmanager" version="(version=0.23.0, branch=release/v0.23.0-0.1, revision=6f8c41aa660a379880af00d7b42fd8ed8af854bd)" level=info ts=2022-08-12T100254.967Z caller=main.go:238 build_context="(go=go1.18, user=ubuntu@ip-172-31-87-228, date=20220503-105046)" level=info ts=2022-08-12T100254.968Z caller=cluster.go:184 component=cluster msg="setting advertise address explicitly" addr=10.0.1.164 port=9094 level=info ts=2022-08-12T100254.970Z caller=cluster.go:679 component=cluster msg="Waiting for gossip to settle..." interval=2s level=info ts=2022-08-12T100255.177Z caller=coordinator.go:141 component=configuration msg="Loading a new configuration" level=warn ts=2022-08-12T100255.181Z caller=configLoader.go:61 component=configuration msg="No channels found in query service " level=info ts=2022-08-12T100255.181Z caller=coordinator.go:156 component=configuration msg="Completed loading of configuration file" RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} level=info ts=2022-08-12T100255.269Z caller=main.go:570 msg=Listening address=:9093 level=info ts=2022-08-12T100255.269Z caller=tls_config.go:191 msg="TLS is disabled." http2=false level=info ts=2022-08-12T100256.972Z caller=cluster.go:704 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.001323238s level=info ts=2022-08-12T100304.978Z caller=cluster.go:696 component=cluster msg="gossip settled; proceeding" elapsed=10.007954437s
a
this shows no receiver is setup.
level=warn ts=2022-08-12T100255.181Z caller=configLoader.go:61 component=configuration msg=“No channels found in query service ”
can you pls send screenshot of channels page (settings>>alert channels)
a
without configuring the channels how to will test till now. i have restarted query service and alertmanager
a
ok
so alert manager queries channel from query service.. the port is :8085 and if it can’t reach the channels wont be loaded by alert manager
ideally an error appears in the log
when alert manager cant reach query service private port
a
these are error i grepped it from query service kubectl logs my-release-signoz-query-service-0 -n platform | grep -i error 2022-08-12T100255.948Z ERROR alertManager/notifier.go:232 alertmanager%!(EXTRA string=http://my-release-signoz-alertmanager:9093/api/v1/alerts, string=count, int=1, string=msg, string=Error calling alert API, string=err, context.deadlineExceededError=context deadline exceeded) 2022-08-12T100316.934Z INFO clickhouseReader/reader.go:689 SELECT serviceName, count(*) as numErrors FROM signoz_traces.signoz_index_v2 WHERE timestamp>='1660298207192000000' AND timestamp<='1660298507192000000' AND kind='2' AND (statusCode>=500 OR statusCode=2) GROUP BY serviceName 2022-08-12T100318.447Z INFO clickhouseReader/reader.go:1039 SELECT COUNT(*) as numTotal FROM signoz_traces.signoz_index_v2 WHERE timestamp >= @timestampL AND timestamp <= @timestampU AND hasError = true 2022-08-12T100318.451Z INFO clickhouseReader/reader.go:1050 SELECT COUNT(*) as numTotal FROM signoz_traces.signoz_index_v2 WHERE timestamp >= @timestampL AND timestamp <= @timestampU AND hasError = false 2022-08-12T104645.981Z DEBUG clickhouseReader/reader.go:2108 Parsing TTL from: MergeTree PARTITION BY toDate(timestamp) PRIMARY KEY (serviceName, hasError, toStartOfHour(timestamp), name) ORDER BY (serviceName, hasError, toStartOfHour(timestamp), name, timestamp) SETTINGS index_granularity = 8192
This is log is from query service from another environment 2022-08-12T072809.658Z ERROR alertManager/manager.go:176 Received Server Error response for API call to alertmanager(POST http://my-release-signoz-alertmanager:9093/api/v1/testReceiver)
a
what do you mean by another environment?
a
we have 2 environments dev and QA
a
are they both on same cluster (k8s)?
a
nope
different clusters
a
both are 0.10.0
?
a
there is no link between them
a
the problem is I can’t relate your screens with logs. The channels are setup but the log do not show them: RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} can you re-create a channel and observe the log (alert manager) if the new channel shows up in the above list
a
ok
i removed the alert channel and re created it
RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {Slack alert channel map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {Slack alert channel map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {Slack alert channel map[alertname:{}] false 30s 5m0s 4h0m0s []}
a
can you check if the alert works now ..
a
sure
now ui is showing up that alert is firing . but till now i did not see any alert in my slack channel
a
ok.. keep noticing alert manager and query service log pls ..
a
Now received only one alert. but still ui showing the alerts are firing status only
a
yes thats fine right.. the alert is still active..so the status will be firing
a
still that alert is in firing status only.
firing since 08/12/2022 043954 PM
a
the alert will stay firing if the alert condition is being met .. so nothing wrong with it
a
now alert status being changed to OK status . if again the alert status changed to firing means i will receive slack alert right ?
a
yes
a
@Amol Umbark this is my promql query avg(system_cpu_load_average_5m) threshold 0.5 i am seeing only firing status. but not received any alert to slack. firing started since 7:19 PM but till now not received any alert kindly. let me know how much time it will take to receive alerts . so that i can understand more clearly
a
An alert indicates that a rule condition has occurrrd. the alert will stay in firing state until the condition no longer exists. if condition no longer exists a resolved status alert is created. let's do a quick call on tuesday so i can understand the issue
a
ok now i got whole scenario.
Hi team, we would like to bring to your notice 2 final issues from my side 1. Alerts are receiving improperly. Example for the first time alert rule condition matches, we are receiving the slack alert as firing. once resolved not receiving resolved alert. Now for the second time alert rule condition matches, we are not receiving the slack alert as firing. but once this second alert resolved we are receiving resolved alert 2. Testing channel notifications throwing an error. Also prashant told the same thing. Thank you team for you continuous support.
a
cc @Amol Umbark
a
@Anil Kumar Bandrapalli let me know if we can huddle sometime today after 2pm
a
sure
a
hey @Anil Kumar Bandrapalli can we connect now
a
sure 5 mins i will make ready my workspace for alerts
can we connect now ?
r
@Amol Umbark and @Prashant Shahi Getting error 2022-08-17T131318.571Z INFO app/server.go:188 /api/v1/services timeTaken: 3.273272ms 2022-08-17T131319.227Z ERROR clickhouseReader/reader.go:675 Error in processing sql query: code: 60, message: Table signoz_traces.top_level_operations doesn't exist
under query-service pod logs, after installing signoz 10.2, although all the pods are in the running state only