This message was deleted SigNoz Community #support

Join Slack

This message was deleted.

# support

Slackbot

08/08/2022, 9:24 AM

This message was deleted.

Ashu

08/08/2022, 12:01 PM

Hi @Anil Kumar Bandrapalli, are you using this package for instrumenation: https://www.npmjs.com/package/@opentelemetry/auto-instrumentations-node This should take care of MongoDB tracing.

Anil Kumar Bandrapalli

08/08/2022, 1:02 PM

we are using that same package. but not able to receive any traces regarding mongodb. this is code snippet

Copy code

const sdk = new opentelemetry.NodeSDK({
  traceExporter,
  instrumentations: [getNodeAutoInstrumentations(), new MongoDBInstrumentation({
    enhancedDatabaseReporting: true,
  }),],
});

Ankit Nayan

08/08/2022, 2:24 PM

@Anil Kumar Bandrapalli tracing should be started as the first thing during your application start. Try importing below as the first import

Copy code

import sdk from './tracing';

Also, are you using MongoDB version

>=3.3 <4

? Found above relevant issue https://github.com/open-telemetry/opentelemetry-js-contrib/issues/683

Ankit Nayan

08/08/2022, 2:25 PM

we not change anything in exporter because we are exporting otlp

Ankit Nayan

08/08/2022, 2:25 PM

so should not export to jaeger

Ankit Nayan

08/08/2022, 2:26 PM

I see mongoose otel package, so instrumenting mongoose should be supported https://www.npmjs.com/package/opentelemetry-instrumentation-mongoose

Anil Kumar Bandrapalli

08/08/2022, 2:26 PM

oh ok i will try to use that package then

Anil Kumar Bandrapalli

08/08/2022, 3:59 PM

@Ankit Nayan, i facing one issue. in the previous version we are able to receive the slack alerts. in the new version we are not able to receive the slack alerts. in the signoz ui showing firing status and then after some time the status is set to ok. but we are not able to receive the alerts. Even in the alerts channel tab, we clicked on Test button , and we are receiving the alerts. I dont know the reason. Could you please help me out on this

Anil Kumar Bandrapalli

08/08/2022, 4:28 PM

and do we trace these alerts failed or success?

Anil Kumar Bandrapalli

08/09/2022, 7:16 AM

any update ?

Anil Kumar Bandrapalli

08/09/2022, 12:53 PM

@Ankit Nayan i have done integrating with mongoose . we are able to see mongodb traces

Anil Kumar Bandrapalli

08/09/2022, 12:54 PM

@Ankit Nayan can you me help me why alerts are not receiving from signoz. testchannel working fine only

Ankit Nayan

08/09/2022, 1:23 PM

@Amol Umbark can you please have a look when you get time

Amol Umbark

08/09/2022, 1:49 PM

hi @Anil Kumar Bandrapalli can you please share log of query service and alert manager when the alert is in firing state and goes to OK after that.

Anil Kumar Bandrapalli

08/09/2022, 3:43 PM

Anil Kumar Bandrapalli

08/09/2022, 3:58 PM

these are query serivce logs 2022-08-09T155617.656Z INFO app/server.go:188 /api/v1/version timeTaken: 180.387µs 2022-08-09T155618.948Z INFO app/server.go:188 /api/v1/version timeTaken: 321.815µs 2022-08-09T155619.817Z INFO app/server.go:188 /api/v1/rules timeTaken: 521.305µs 2022-08-09T155627.657Z INFO app/server.go:188 /api/v1/version timeTaken: 236.715µs 2022-08-09T155628.947Z INFO app/server.go:188 /api/v1/version timeTaken: 201.875µs 2022-08-09T155636.495Z INFO rules/promRuleTask.go:315 promql rule task:4-groupname eval started at:2022-08-09 155636.494261596 +0000 UTC 2022-08-09T155636.495Z INFO rules/promRule.go:307 rule:High Transaction Time Alert slack evaluating promql query: histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="POST /api/task/complete"}[1m])) by (le)) > 50.000000 time="2022-08-09T155636Z" level=warning msg="Ignoring hint {StepMs:0 Func:rate StartMs:1660060536494 EndMs:1660060596494} for query [1660060536494,1660060596494,{service_name=\"workflow-service\",operation=\"POST /api/task/complete\",name=\"signoz_latency_bucket\"}]." component=clickhouse 2022-08-09T155637.656Z INFO app/server.go:188 /api/v1/version timeTaken: 186.124µs 2022-08-09T155638.947Z INFO app/server.go:188 /api/v1/version timeTaken: 171.214µs 2022-08-09T155647.656Z INFO app/server.go:188 /api/v1/version timeTaken: 195.136µs 2022-08-09T155648.855Z INFO app/server.go:188 /api/v1/rules timeTaken: 522.895µs 2022-08-09T155648.947Z INFO app/server.go:188 /api/v1/version timeTaken: 272.605µs 2022-08-09T155657.657Z INFO app/server.go:188 /api/v1/version timeTaken: 217.998µs 2022-08-09T155658.947Z INFO app/server.go:188 /api/v1/version timeTaken: 198.955µs 2022-08-09T155707.656Z INFO app/server.go:188 /api/v1/version timeTaken: 200.184µs 2022-08-09T155708.948Z INFO app/server.go:188 /api/v1/version timeTaken: 278.13µs 2022-08-09T155717.656Z INFO app/server.go:188 /api/v1/version timeTaken: 220.669µs 2022-08-09T155718.860Z INFO app/server.go:188 /api/v1/rules timeTaken: 584.285µs 2022-08-09T155718.948Z INFO app/server.go:188 /api/v1/version timeTaken: 159.896µs 2022-08-09T155727.656Z INFO app/server.go:188 /api/v1/version timeTaken: 203.414µs 2022-08-09T155728.947Z INFO app/server.go:188 /api/v1/version timeTaken: 201.721µs 2022-08-09T155736.495Z INFO rules/promRuleTask.go:315 promql rule task:4-groupname eval started at:2022-08-09 155736.494261596 +0000 UTC 2022-08-09T155736.495Z INFO rules/promRule.go:307 rule:High Transaction Time Alert slack evaluating promql query: histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="POST /api/task/complete"}[1m])) by (le)) > 50.000000 time="2022-08-09T155736Z" level=warning msg="Ignoring hint {StepMs:0 Func:rate StartMs:1660060596494 EndMs:1660060656494} for query [1660060596494,1660060656494,{service_name=\"workflow-service\",operation=\"POST /api/task/complete\",name=\"signoz_latency_bucket\"}]." component=clickhouse 2022-08-09T155737.656Z INFO app/server.go:188 /api/v1/version timeTaken: 200.241µs 2022-08-09T155738.947Z INFO app/server.go:188 /api/v1/version timeTaken: 196.769µs 2022-08-09T155747.657Z INFO app/server.go:188 /api/v1/version timeTaken: 203.496µs 2022-08-09T155748.862Z INFO app/server.go:188 /api/v1/rules timeTaken: 528.558µs 2022-08-09T155748.948Z INFO app/server.go:188 /api/v1/version timeTaken: 186.908µs 2022-08-09T155757.656Z INFO app/server.go:188 /api/v1/version timeTaken: 197.801µs 2022-08-09T155758.948Z INFO app/server.go:188 /api/v1/version timeTaken: 200.078µs 2022-08-09T155807.656Z INFO app/server.go:188 /api/v1/version timeTaken: 168.084µs 2022-08-09T155808.947Z INFO app/server.go:188 /api/v1/version timeTaken: 173.779µs 2022-08-09T155817.656Z INFO app/server.go:188 /api/v1/version timeTaken: 211.835µs

Anil Kumar Bandrapalli

08/09/2022, 3:59 PM

And dont see any thing from alertmanager

Anil Kumar Bandrapalli

08/09/2022, 3:59 PM

logs

Amol Umbark

08/09/2022, 4:03 PM

ok i don't see any alert firing. did you notice a firing status in the page triggered alerts

Anil Kumar Bandrapalli

08/10/2022, 4:44 AM

yes Amol

Amol Umbark

08/10/2022, 4:58 AM

is it likely that the condition stayed for just under one minute. i can see that the formula is looking at last 1 min. there is a send delay for sometime (5 minute i think) in alert manager so if the condition resolved in that timeframe you may not receive. can you pls try a more broader condition here and check. may be increas the timeframe to 5 or 10 mins and reduce threshold .

Anil Kumar Bandrapalli

08/10/2022, 5:05 AM

sure

Ankit Nayan

08/10/2022, 5:08 AM

@Anil Kumar Bandrapalli

1m

interval does not work. Please confirm that by seeing the chart. You probably won't see a chart on alerts page if you choose

1m

as your time range

Anil Kumar Bandrapalli

08/10/2022, 7:44 AM

@Amol Umbark, this is my promql query

Copy code

histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="POST /api/task/complete"}[10m])) by (le))

When i create an alert with the above query, in logs i see that frequency set to 1m0s. so every minute i am seeing that above query is being evaluated and UI showing firing status in triggered alerts tab

Anil Kumar Bandrapalli

08/10/2022, 10:16 AM

@Ankit Nayan any suggestions ?

Anil Kumar Bandrapalli

08/10/2022, 1:03 PM

Also found this log for simple cpu utilization msg: skipping send alert due to resend delay%!(EXTRA string= rule: , string=High Transaction Time Alert, string= alert:, labels.Labels={alertname="High Transaction Time Alert", fingerprint="10661184619665832956", fullLabels="{\"__name__\":\"system_cpu_load_average_1m\"}", ruleId="9", ruleSource="https://signoz.accionbreeze.com/alerts/new", severity="warning"})

Anil Kumar Bandrapalli

08/10/2022, 1:05 PM

even i set the time for 5m . it is trying to calculate for every minute and sending alerts but not able to receive alerts in slack

Amol Umbark

08/10/2022, 1:05 PM

resend delay occurs when an alert was already sent.

Anil Kumar Bandrapalli

08/10/2022, 1:05 PM

how to do resolve this issue ?

Amol Umbark

08/10/2022, 1:05 PM

the alert frequency is one minute so the query will run every minute. the time range can vary

Anil Kumar Bandrapalli

08/10/2022, 1:06 PM

but how to increase this frequency ?

Amol Umbark

08/10/2022, 1:07 PM

how will it help

Amol Umbark

08/10/2022, 1:07 PM

are you not receiving any alerts or just one rule has a problem

Anil Kumar Bandrapalli

08/10/2022, 1:08 PM

i have written one simple alert even though i didn't receive any alerts in slack.

Amol Umbark

08/10/2022, 1:08 PM

but test notifications option on the alerts page works right?

Anil Kumar Bandrapalli

08/10/2022, 1:08 PM

yes

Amol Umbark

08/10/2022, 1:09 PM

let me get back

Anil Kumar Bandrapalli

08/10/2022, 1:10 PM

if you see the sequence of the logs in alert manager first it is finding alerts , and firing but after few seconds we are receiving skipping send alert message. So how to resolve this ?

Anil Kumar Bandrapalli

08/10/2022, 1:11 PM

FYI these are the sequence of logs 2022-08-10T130941.186Z DEBUG rules/ruleTask.go:331 msg:%!(EXTRA string=rule task eval started, string= name:, string=9-groupname, string= start time:, time.Time=2022-08-10 130941.184063358 +0000 UTC) 2022-08-10T130941.186Z DEBUG rules/thresholdRule.go:505 ruleid:%!(EXTRA string=9, string= runQueries:, map[string]string=map[A:SELECT fingerprint, labels as fullLabels, toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), INTERVAL 30 SECOND) as ts, any(value) as value FROM signoz_metrics.samples_v2 INNER JOIN (SELECT labels, fingerprint FROM signoz_metrics.time_series_v2 WHERE metric_name = 'system_cpu_load_average_1m' AND labels_object.name IN ['system_cpu_load_average_1m']) as filtered_time_series USING fingerprint WHERE metric_name = 'system_cpu_load_average_1m' AND timestamp_ms >= 1660136681184 AND timestamp_ms <= 1660136981184 GROUP BY fingerprint, labels, ts ORDER BY fingerprint, labels, ts]) 2022-08-10T130941.186Z DEBUG rules/thresholdRule.go:523 ruleId: %!(EXTRA string=9, string= result query label:, string=A) 2022-08-10T130941.217Z INFO rules/thresholdRule.go:619 rule:High Transaction Time Alert alerts found: 1 2022-08-10T130941.217Z INFO rules/thresholdRule.go:283 msg:initiating send alerts (if any) rule:High Transaction Time Alert 2022-08-10T130941.217Z DEBUG rules/thresholdRule.go:297 msg: skipping send alert due to resend delay%!(EXTRA string= rule: , string=High Transaction Time Alert, string= alert:, labels.Labels={alertname="High Transaction Time Alert", fingerprint="10661184619665832956", fullLabels="{\"__name__\":\"system_cpu_load_average_1m\"}", ruleId="9", ruleSource="https://signoz.accionbreeze.com/alerts/new", severity="warning"}) 2022-08-10T130946.629Z INFO app/server.go:188 /api/v1/rules timeTaken: 1.10824ms 2022-08-10T130947.656Z INFO app/server.go:188 /api/v1/version timeTaken: 251.177µs 2022-08-10T130948.947Z INFO app/server.go:188 /api/v1/version timeTaken: 222.816µs 2022-08-10T130957.656Z INFO app/server.go:188 /api/v1/version timeTaken: 202.774µs 2022-08-10T130958.947Z INFO app/server.go:188 /api/v1/version timeTaken: 195.425µs 2022-08-10T131007.656Z INFO app/server.go:188 /api/v1/version timeTaken: 181.432µs 2022-08-10T131008.947Z INFO app/server.go:188 /api/v1/version timeTaken: 194.131µs 2022-08-10T131016.644Z INFO app/server.go:188 /api/v1/rules timeTaken: 1.204984ms 2022-08-10T131017.656Z INFO app/server.go:188 /api/v1/version timeTaken: 188.567µs 2022-08-10T131018.950Z INFO app/server.go:188 /api/v1/version timeTaken: 1.27064ms 2022-08-10T131027.656Z INFO app/server.go:188 /api/v1/version timeTaken: 179.987µs 2022-08-10T131028.948Z INFO app/server.go:188 /api/v1/version timeTaken: 237.169µs

Amol Umbark

08/10/2022, 2:30 PM

@Anil Kumar Bandrapalli i am trying to reproduce the issue. will update you

Anil Kumar Bandrapalli

08/10/2022, 3:18 PM

Ok thank you

Amol Umbark

08/10/2022, 6:22 PM

Hi @Anil Kumar Bandrapalli I could not reproduce the issue. But the error that you see “skipping resend…” occurs when a message is already sent. So it is strange that you never received any message in slack. Another possibility is the call to alert manager api has failed. To test this, can you disable all rules except one. Restart the services (query service). and then monitor the log. As soon as the service starts, the rules will be started and you can see if alerts found condition appears in the log. If so you can also see if api call to alert manager failed in the log.

Anil Kumar Bandrapalli

08/11/2022, 5:19 AM

i will give a try amol

Anil Kumar Bandrapalli

08/11/2022, 7:34 AM

Hi @Amol Umbark / @Ankit Nayan, before your reply i just downgraded signoz to 9. we are getting different issue. this is the below error Received Server Error response for API call to alertmanager(POST http://my-release-signoz-alertmanager:9093/api/v1/testReceiver)\n%!(EXTRA string=404 Not Found)" Do you have any insights on this error

Anil Kumar Bandrapalli

08/11/2022, 7:34 AM

Anil Kumar Bandrapalli

08/11/2022, 11:02 AM

Even after reverting to version 10 we are getting the same issue

Anil Kumar Bandrapalli

08/11/2022, 11:03 AM

yes right now i am in version 10 only. but still getting this error

Anil Kumar Bandrapalli

08/11/2022, 11:03 AM

Received Server Error response for API call to alertmanager(POST http://my-release-signoz-alertmanager:9093/api/v1/testReceiver)\n%!(EXTRA string=404 Not Found)"

Amol Umbark

08/11/2022, 11:03 AM

got it .. the API host is different

Amol Umbark

08/11/2022, 11:04 AM

are you setting ALERTMANAGER_API_PREFIX in env var?

Amol Umbark

08/11/2022, 11:05 AM

the api call is usually http://alertmanager:9093/api/v1/testReceiver

Amol Umbark

08/11/2022, 11:06 AM

can you try accessing http://my-release-signoz-alertmanager:9093 does it take you to alert manager dashboard?

Anil Kumar Bandrapalli

08/11/2022, 11:06 AM

from this host the test alert was fired https://signoz.accionbreeze.com/api/v1/testChannel

Anil Kumar Bandrapalli

08/11/2022, 11:07 AM

internally it is calling that my-release-signoz-alertmanager

Rahul Tiwari

08/11/2022, 11:07 AM

but where we can check the value for ALERTMANAGER_API_PREFIX, i already login into alertmanager pod but unable to find any env variable with name ALERTMANAGER_API_PREFIX

Amol Umbark

08/11/2022, 11:09 AM

so the way this works is: • alert manager APi is derived as http://alertmanager:9093/api/ • the docker compose assigns the name alertmanager to alert manager pod. • hence query service can access it within network I am wondering why in your case the URL of alert manager is different. @Prashant Shahi any thoughts?

Anil Kumar Bandrapalli

08/11/2022, 11:14 AM

for your information we are deploying via kubernetes

Amol Umbark

08/11/2022, 11:15 AM

yeh i understand that

Anil Kumar Bandrapalli

08/11/2022, 1:00 PM

any update on or some kind solution ?

Rahul Tiwari

08/12/2022, 5:43 AM

@Amol Umbark and @Prashant Shahi i changed the env vars of ALERTMANAGER_API_PREFIX to http://alertmanager:9093/api/ but still we are getting the same error msg. 2022-08-12T054009.919Z ERROR alertManager/manager.go:164 Error in getting response of API call to alertmanager(POST http://alertmanager:9093/api/v1/testReceiver) %!(EXTRA *url.Error=Post "http://alertmanager:9093/api/v1/testReceiver": dial tcp: lookup alertmanager on 172.20.0.1053 no such host) go.signoz.io/query-service/integrations/alertManager.(*manager).TestReceiver

Prashant Shahi

08/12/2022, 5:46 AM

If you are followed docs for installation, approriate endpoint would be

<http://my-release-signoz-alertmanager:9093>

Prashant Shahi

08/12/2022, 5:46 AM

And moreover that is set automatically by Helm chart

Rahul Tiwari

08/12/2022, 5:48 AM

ok but @Amol Umbark old us that http://my-release-signoz-alertmanager:9093 is not the correct API http://alertmanager:9093 is the correct API, we tried with both the but error msg is the same.

Rahul Tiwari

08/12/2022, 5:49 AM

Although i reverted it to http://my-release-signoz-alertmanager:9093

Prashant Shahi

08/12/2022, 5:49 AM

In case of Docker, we don't set any env since code defaults to

alertmanager

Prashant Shahi

08/12/2022, 5:49 AM

In case of K8s, the default env should be correct

Rahul Tiwari

08/12/2022, 5:50 AM

am using the kubernetes way of installation

Prashant Shahi

08/12/2022, 5:50 AM

Could you restart the alertmanager and queryservice pod?

Anil Kumar Bandrapalli

08/12/2022, 5:52 AM

we done that as well . here is the screenshot for your reference

Prashant Shahi

08/12/2022, 5:55 AM

Copy code

kubectl get svc -n platform

Prashant Shahi

08/12/2022, 5:56 AM

Can you share output of the command above?

Rahul Tiwari

08/12/2022, 5:57 AM

[ec2-user@ip-10-0-4-191 ~]$ k get svc -n platform NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE chi-signoz-cluster-0-0 ClusterIP None <none> 8123/TCP,9000/TCP,9009/TCP 19h clickhouse-operator-metrics ClusterIP 172.20.22.194 <none> 8888/TCP 19h my-release-clickhouse ClusterIP 172.20.25.33 <none> 8123/TCP,9000/TCP 19h my-release-signoz-alertmanager ClusterIP 172.20.128.93 <none> 9093/TCP 19h my-release-signoz-alertmanager-headless ClusterIP None <none> 9093/TCP 19h my-release-signoz-frontend NodePort 172.20.174.179 <none> 3301:31596/TCP 19h my-release-signoz-otel-collector ClusterIP 172.20.195.124 <none> 14250/TCP,14268/TCP,4317/TCP,4318/TCP 19h my-release-signoz-otel-collector-metrics ClusterIP 172.20.51.23 <none> 13133/TCP 19h my-release-signoz-query-service ClusterIP 172.20.82.146 <none> 8080/TCP,8085/TCP 19h my-release-zookeeper ClusterIP 172.20.226.187 <none> 2181/TCP,2888/TCP,3888/TCP 19h my-release-zookeeper-headless ClusterIP None <none> 2181/TCP,2888/TCP,3888/TCP 19h [ec2-user@ip-10-0-4-191 ~]$

Prashant Shahi

08/12/2022, 6:21 AM

Yup.. the alertmanager endpoint is correct

Prashant Shahi

08/12/2022, 6:24 AM

It could of one the two things: • Alertmanager pod is unhealthy • Frontend nginx or query-service is unable to resolve to correct address of alertmanager pod

Prashant Shahi

08/12/2022, 6:24 AM

Copy code

kubectl get pods -n platform

Rahul Tiwari

08/12/2022, 6:32 AM

[ec2-user@ip-10-0-4-191 ~]$ kubectl get pods -n platform NAME READY STATUS RESTARTS AGE chi-signoz-cluster-0-0-0 1/1 Running 0 19h clickhouse-operator-598444b99-qbnbj 2/2 Running 0 19h my-release-signoz-alertmanager-0 1/1 Running 0 38m my-release-signoz-frontend-584d85596b-648lh 1/1 Running 0 19h my-release-signoz-otel-collector-68cd55f8-xgtgl 1/1 Running 0 19h my-release-signoz-otel-collector-metrics-6789c89544-nrzvr 1/1 Running 0 19h my-release-signoz-query-service-0 1/1 Running 0 37m my-release-zookeeper-0 1/1 Running 0 19h [ec2-user@ip-10-0-4-191 ~]$

Prashant Shahi

08/12/2022, 6:40 AM

Just tested and I was able to reproduce this error

Prashant Shahi

08/12/2022, 6:40 AM

Let me look into it and get back to you

Rahul Tiwari

08/12/2022, 6:40 AM

Anil Kumar Bandrapalli

08/12/2022, 6:41 AM

ok thank you.

Prashant Shahi

08/12/2022, 6:48 AM

Only test alert channel endpoint seems to be broken. Configured alerts work as expected.

Prashant Shahi

08/12/2022, 6:48 AM

cc @Amol Umbark

Amol Umbark

08/12/2022, 6:49 AM

ok let me take a look

Anil Kumar Bandrapalli

08/12/2022, 6:58 AM

Also @Prashant Shahi , we configured alerts are not working . in the ui showing firing and again status setting to OK. but we are not able to receive the alerts to slack

Anil Kumar Bandrapalli

08/12/2022, 6:59 AM

we are getting this message msg: skipping send alert due to resend delay%!(EXTRA string= rule: , string=High Transaction Time Alert, string= alert:, labels.Labels={alertname="High Transaction Time Alert", ruleId="1",

Prashant Shahi

08/12/2022, 7:00 AM

I tested with webhook using https://webhook.site/

Anil Kumar Bandrapalli

08/12/2022, 7:00 AM

we set the alert for "system_cpu_load_average_15m".

Anil Kumar Bandrapalli

08/12/2022, 7:01 AM

but we configured for my own slack channel . if you need it i will provide you the details as well

Prashant Shahi

08/12/2022, 7:01 AM

got it. let me check with slack alert and get back

Amol Umbark

08/12/2022, 7:05 AM

@Anil Kumar Bandrapalli can you pls review the log prior to this message. the message appears only after the first sent message. the message also would not occur first time when you restart the query service .. so if you restart you should see that alert has run and no of alerts found

Rahul Tiwari

08/12/2022, 7:08 AM

2022-08-12T065623.742Z INFO rules/thresholdRule.go:619 rule:High Transaction Time Alert alerts found: 1 2022-08-12T065623.742Z INFO rules/thresholdRule.go:283 msg:initiating send alerts (if any) rule:High Transaction Time Alert 2022-08-12T065623.742Z DEBUG rules/thresholdRule.go:297 msg: skipping send alert due to resend delay%!(EXTRA string= rule: , string=High Transaction Time Alert, string= alert:, labels.Labels={alertname="High Transaction Time Alert", ruleId="1", ruleSource="https://signoz.accionbreeze.com/alerts/new", severity="warning"}) 2022-08-12T065625.568Z INFO app/server.go:188 /api/v1/version timeTaken: 230.346µs 2022-08-12T065631.007Z INFO app/server.go:188 /api/v1/version timeTaken: 401.041µs 2022-08-12T065635.568Z INFO app/server.go:188 /api/v1/version timeTaken: 201.353µs 2022-08-12T065641.008Z INFO app/server.go:188 /api/v1/version timeTaken: 689.342µs 2022-08-12T065645.568Z INFO app/server.go:188 /api/v1/version timeTaken: 288.93µs 2022-08-12T065651.016Z INFO app/server.go:188 /api/v1/version timeTaken: 354.102µs 2022-08-12T065655.568Z INFO app/server.go:188 /api/v1/version timeTaken: 644.13µs 2022-08-12T065701.014Z INFO app/server.go:188 /api/v1/version timeTaken: 274.136µs 2022-08-12T065705.568Z INFO app/server.go:188 /api/v1/version timeTaken: 335.678µs 2022-08-12T065711.014Z INFO app/server.go:188 /api/v1/version timeTaken: 233.826µs 2022-08-12T065715.567Z INFO app/server.go:188 /api/v1/version timeTaken: 191.212µs 2022-08-12T065721.007Z INFO app/server.go:188 /api/v1/version timeTaken: 206.946µs 2022-08-12T065723.709Z DEBUG rules/ruleTask.go:331 msg:%!(EXTRA string=rule task eval started, string= name:, string=1-groupname, string= start time:, time.Time=2022-08-12 065723.708306749 +0000 UTC) 2022-08-12T065723.709Z DEBUG rules/thresholdRule.go:505 ruleid:%!(EXTRA string=1, string= runQueries:, map[string]string=map[A:SELECT name, toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), INTERVAL 30 SECOND) as ts, avg(value) as value FROM signoz_metrics.samples_v2 INNER JOIN (SELECT labels_object.name as name, fingerprint FROM signoz_metrics.time_series_v2 WHERE metric_name = 'system_cpu_load_average_15m') as filtered_time_series USING fingerprint WHERE metric_name = 'system_cpu_load_average_15m' AND timestamp_ms >= 1660287143708 AND timestamp_ms <= 1660287443708 GROUP BY name,ts ORDER BY name, ts])

Amol Umbark

08/12/2022, 7:17 AM

is this the first alert execution after restart?

Anil Kumar Bandrapalli

08/12/2022, 7:19 AM

after restart we created this alert and shared the log with you. now i will do restart the alert manager and query service and will share the logs

Anil Kumar Bandrapalli

08/12/2022, 7:26 AM

@Amol Umbark, after restart also i have seen the same logs and alerts not received in slack

Prashant Shahi

08/12/2022, 8:06 AM

Slack alerts works as expected for me in

v0.10.2

. Could you share the screenshot of the alert?

Amol Umbark

08/12/2022, 8:06 AM

@Anil Kumar Bandrapalli can you please upgrade to v0.10.2. This version has a enable/disable option .. we can try disabling and enabling to capture exact log

Amol Umbark

08/12/2022, 8:07 AM

As i can see, you are on v0.10.0 which doesnt have this funcitonality of disabling or testing alert notification

Anil Kumar Bandrapalli

08/12/2022, 8:09 AM

@Prashant Shahi ALERT MEANS alert rule or alert channel page?

Anil Kumar Bandrapalli

08/12/2022, 8:14 AM

@Amol Umbark we have one alert only configured not like more alerts.

Anil Kumar Bandrapalli

08/12/2022, 8:22 AM

@Prashant Shahi, we are able to receive alerts properly in version 10.0 earlier. now only giving this error. The fact is we can't upgrade to version 10.2 as we already deployed version 10.0 to our client. Could you help us to resolve this issue

Anil Kumar Bandrapalli

08/12/2022, 8:22 AM

please

Anil Kumar Bandrapalli

08/12/2022, 9:35 AM

@Prashant Shahi /@Ankit Nayan /@Amol Umbark, kindly help us out

Amol Umbark

08/12/2022, 9:47 AM

ok let me try to reproduce with 10.0

Anil Kumar Bandrapalli

08/12/2022, 9:48 AM

But not able to receive alerts in slack

Amol Umbark

08/12/2022, 9:49 AM

do you see any alerts in triggered tab

Amol Umbark

08/12/2022, 9:58 AM

next to alert rules you can see triggered alert tab . do you see alerts there or any error when you switch to the tab

Anil Kumar Bandrapalli

08/12/2022, 9:58 AM

yes in firing status only. no errors

Amol Umbark

08/12/2022, 9:59 AM

send me screenshots ok

Amol Umbark

08/12/2022, 9:59 AM

please

Amol Umbark

08/12/2022, 9:59 AM

if alert shows there it means the alert has reached alert manager

Amol Umbark

08/12/2022, 10:00 AM

can you also send alert manager log. restart it before you grab the alert

Amol Umbark

08/12/2022, 10:01 AM

the triggered tab looks fine. will need alert manager log, restart it before grabbing the log pls

Anil Kumar Bandrapalli

08/12/2022, 10:04 AM

level=info ts=2022-08-12T100254.967Z caller=main.go:237 msg="Starting Alertmanager" version="(version=0.23.0, branch=release/v0.23.0-0.1, revision=6f8c41aa660a379880af00d7b42fd8ed8af854bd)" level=info ts=2022-08-12T100254.967Z caller=main.go:238 build_context="(go=go1.18, user=ubuntu@ip-172-31-87-228, date=20220503-105046)" level=info ts=2022-08-12T100254.968Z caller=cluster.go:184 component=cluster msg="setting advertise address explicitly" addr=10.0.1.164 port=9094 level=info ts=2022-08-12T100254.970Z caller=cluster.go:679 component=cluster msg="Waiting for gossip to settle..." interval=2s level=info ts=2022-08-12T100255.177Z caller=coordinator.go:141 component=configuration msg="Loading a new configuration" level=warn ts=2022-08-12T100255.181Z caller=configLoader.go:61 component=configuration msg="No channels found in query service " level=info ts=2022-08-12T100255.181Z caller=coordinator.go:156 component=configuration msg="Completed loading of configuration file" RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} level=info ts=2022-08-12T100255.269Z caller=main.go:570 msg=Listening address=:9093 level=info ts=2022-08-12T100255.269Z caller=tls_config.go:191 msg="TLS is disabled." http2=false level=info ts=2022-08-12T100256.972Z caller=cluster.go:704 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.001323238s level=info ts=2022-08-12T100304.978Z caller=cluster.go:696 component=cluster msg="gossip settled; proceeding" elapsed=10.007954437s

Amol Umbark

08/12/2022, 10:44 AM

this shows no receiver is setup.

Amol Umbark

08/12/2022, 10:45 AM

level=warn ts=2022-08-12T100255.181Z caller=configLoader.go:61 component=configuration msg=“No channels found in query service ”

Amol Umbark

08/12/2022, 10:46 AM

can you pls send screenshot of channels page (settings>>alert channels)

Anil Kumar Bandrapalli

08/12/2022, 10:46 AM

without configuring the channels how to will test till now. i have restarted query service and alertmanager

Amol Umbark

08/12/2022, 10:47 AM

Amol Umbark

08/12/2022, 10:47 AM

so alert manager queries channel from query service.. the port is :8085 and if it can’t reach the channels wont be loaded by alert manager

Amol Umbark

08/12/2022, 10:47 AM

ideally an error appears in the log

Amol Umbark

08/12/2022, 10:47 AM

when alert manager cant reach query service private port

Anil Kumar Bandrapalli

08/12/2022, 10:52 AM

these are error i grepped it from query service kubectl logs my-release-signoz-query-service-0 -n platform | grep -i error 2022-08-12T100255.948Z ERROR alertManager/notifier.go:232 alertmanager%!(EXTRA string=http://my-release-signoz-alertmanager:9093/api/v1/alerts, string=count, int=1, string=msg, string=Error calling alert API, string=err, context.deadlineExceededError=context deadline exceeded) 2022-08-12T100316.934Z INFO clickhouseReader/reader.go:689 SELECT serviceName, count(*) as numErrors FROM signoz_traces.signoz_index_v2 WHERE timestamp>='1660298207192000000' AND timestamp<='1660298507192000000' AND kind='2' AND (statusCode>=500 OR statusCode=2) GROUP BY serviceName 2022-08-12T100318.447Z INFO clickhouseReader/reader.go:1039 SELECT COUNT(*) as numTotal FROM signoz_traces.signoz_index_v2 WHERE timestamp >= @timestampL AND timestamp <= @timestampU AND hasError = true 2022-08-12T100318.451Z INFO clickhouseReader/reader.go:1050 SELECT COUNT(*) as numTotal FROM signoz_traces.signoz_index_v2 WHERE timestamp >= @timestampL AND timestamp <= @timestampU AND hasError = false 2022-08-12T104645.981Z DEBUG clickhouseReader/reader.go:2108 Parsing TTL from: MergeTree PARTITION BY toDate(timestamp) PRIMARY KEY (serviceName, hasError, toStartOfHour(timestamp), name) ORDER BY (serviceName, hasError, toStartOfHour(timestamp), name, timestamp) SETTINGS index_granularity = 8192

Anil Kumar Bandrapalli

08/12/2022, 10:56 AM

This is log is from query service from another environment 2022-08-12T072809.658Z ERROR alertManager/manager.go:176 Received Server Error response for API call to alertmanager(POST http://my-release-signoz-alertmanager:9093/api/v1/testReceiver)

Amol Umbark

08/12/2022, 10:57 AM

what do you mean by another environment?

Anil Kumar Bandrapalli

08/12/2022, 10:58 AM

we have 2 environments dev and QA

Amol Umbark

08/12/2022, 10:58 AM

are they both on same cluster (k8s)?

Anil Kumar Bandrapalli

08/12/2022, 10:58 AM

nope

Anil Kumar Bandrapalli

08/12/2022, 10:59 AM

different clusters

Amol Umbark

08/12/2022, 10:59 AM

both are 0.10.0

Amol Umbark

08/12/2022, 10:59 AM

Anil Kumar Bandrapalli

08/12/2022, 10:59 AM

there is no link between them

Amol Umbark

08/12/2022, 11:01 AM

the problem is I can’t relate your screens with logs. The channels are setup but the log do not show them: RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} can you re-create a channel and observe the log (alert manager) if the new channel shows up in the above list

Anil Kumar Bandrapalli

08/12/2022, 11:05 AM

Anil Kumar Bandrapalli

08/12/2022, 11:08 AM

i removed the alert channel and re created it

Anil Kumar Bandrapalli

08/12/2022, 11:08 AM

RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {Slack alert channel map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {Slack alert channel map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {Slack alert channel map[alertname:{}] false 30s 5m0s 4h0m0s []}

Amol Umbark

08/12/2022, 11:09 AM

can you check if the alert works now ..

Anil Kumar Bandrapalli

08/12/2022, 11:09 AM

sure

Anil Kumar Bandrapalli

08/12/2022, 11:11 AM

now ui is showing up that alert is firing . but till now i did not see any alert in my slack channel

Amol Umbark

08/12/2022, 11:12 AM

ok.. keep noticing alert manager and query service log pls ..

Anil Kumar Bandrapalli

08/12/2022, 11:14 AM

Now received only one alert. but still ui showing the alerts are firing status only

Amol Umbark

08/12/2022, 11:15 AM

yes thats fine right.. the alert is still active..so the status will be firing

Anil Kumar Bandrapalli

08/12/2022, 11:20 AM

still that alert is in firing status only.

Anil Kumar Bandrapalli

08/12/2022, 11:21 AM

firing since 08/12/2022 043954 PM

Amol Umbark

08/12/2022, 11:41 AM

the alert will stay firing if the alert condition is being met .. so nothing wrong with it

Anil Kumar Bandrapalli

08/12/2022, 11:53 AM

now alert status being changed to OK status . if again the alert status changed to firing means i will receive slack alert right ?

Amol Umbark

08/12/2022, 12:22 PM

yes

Anil Kumar Bandrapalli

08/12/2022, 2:03 PM

@Amol Umbark this is my promql query avg(system_cpu_load_average_5m) threshold 0.5 i am seeing only firing status. but not received any alert to slack. firing started since 7:19 PM but till now not received any alert kindly. let me know how much time it will take to receive alerts . so that i can understand more clearly

Amol Umbark

08/13/2022, 6:45 AM

An alert indicates that a rule condition has occurrrd. the alert will stay in firing state until the condition no longer exists. if condition no longer exists a resolved status alert is created. let's do a quick call on tuesday so i can understand the issue

Anil Kumar Bandrapalli

08/14/2022, 8:26 AM

ok now i got whole scenario.

Anil Kumar Bandrapalli

08/16/2022, 5:53 AM

Hi team, we would like to bring to your notice 2 final issues from my side 1. Alerts are receiving improperly. Example for the first time alert rule condition matches, we are receiving the slack alert as firing. once resolved not receiving resolved alert. Now for the second time alert rule condition matches, we are not receiving the slack alert as firing. but once this second alert resolved we are receiving resolved alert 2. Testing channel notifications throwing an error. Also prashant told the same thing. Thank you team for you continuous support.

Ankit Nayan

08/16/2022, 5:54 AM

cc @Amol Umbark

Amol Umbark

08/16/2022, 6:55 AM

@Anil Kumar Bandrapalli let me know if we can huddle sometime today after 2pm

Anil Kumar Bandrapalli

08/16/2022, 7:11 AM

sure

Amol Umbark

08/16/2022, 8:46 AM

hey @Anil Kumar Bandrapalli can we connect now

Anil Kumar Bandrapalli

08/16/2022, 8:54 AM

sure 5 mins i will make ready my workspace for alerts

Anil Kumar Bandrapalli

08/16/2022, 9:03 AM

can we connect now ?

Rahul Tiwari

08/17/2022, 1:22 PM

@Amol Umbark and @Prashant Shahi Getting error 2022-08-17T131318.571Z INFO app/server.go:188 /api/v1/services timeTaken: 3.273272ms 2022-08-17T131319.227Z ERROR clickhouseReader/reader.go:675 Error in processing sql query: code: 60, message: Table signoz_traces.top_level_operations doesn't exist

Rahul Tiwari

08/17/2022, 1:22 PM

under query-service pod logs, after installing signoz 10.2, although all the pods are in the running state only

35 Views

Open in Slack

Previous Next