Hey I've created an alert in which I've set the co...
# support
h
Hey I've created an alert in which I've set the condition to fire the alert when the value goes below threshold, but instead it's firing even when the value is higher than that of threshold, attached screenshots for more insight
the query I'm writing for this is
Copy code
select
toStartOfInterval(timestamp, INTERVAL 2 MINUTE) AS interval,
serviceName,
((select count() as value from signoz_traces.signoz_index_v2
where httpCode>='100' AND httpCode<='499')*100.0
/
(select count() as value from signoz_traces.signoz_index_v2
where httpCode>='100' AND httpCode<='599')*1.0)
FROM signoz_traces.distributed_signoz_index_v2
where
serviceName = 'frontend' AND
timestamp BETWEEN {{.start_datetime}} AND {{.end_datetime}}
GROUP BY (serviceName, interval);
a
hey @Harshith.R.S which version are you on? are you on docker or k8s ?
I will try to reproduce and get back
h
docker setup v1.16.1
a
hey @Harshith.R.S can you try sending a test notification on this form. I would suggest picking very high threshold so the below is going to always work
h
I've set the threshold to 80 the alert is still firing
a
Hey @Harshith.R.S I could not reproduce the error. Can you try the following query. I just changed interval to 5 minute instead of 2. Also change "in total" to at least once.
Copy code
select
toStartOfInterval(timestamp, INTERVAL 5 MINUTE) AS interval,
serviceName,
((select count() as value from signoz_traces.signoz_index_v2
where httpCode>='100' AND httpCode<='499')*100.0
/
(select count() as value from signoz_traces.signoz_index_v2
where httpCode>='100' AND httpCode<='599')*1.0)
FROM signoz_traces.distributed_signoz_index_v2
where
serviceName = 'frontend' AND
timestamp BETWEEN {{.start_datetime}} AND {{.end_datetime}}
GROUP BY (serviceName, interval);
Also do these steps to get more clarity: 1. Get a new webhook url from webhook.site 2. Define a channel for the webhook url 3. When defining alert add webhook channel in the list of preferred channels 4. Send a test notification to see the observed value in webhook.site. Please share the results.
Also would like to see the output of the query from clickhouse cli. more instructions on setting up cli here
h
I don't know why it's not working for me
a
can we do a quick huddle?
h
image.png
image.png
Copy code
2023-03-03T09:10:36.514Z        ERROR   alertManager/notifier.go:232    alertmanager%!(EXTRA string=<http://alertmanager:9093/api/v1/alerts>, string=count, int=2, string=msg, string=Error calling alert API, string=err, *errors.errorString=bad response status 400 Bad Request)
<http://go.signoz.io/signoz/pkg/query-service/integrations/alertManager.(*Notifier).sendAll.func1|go.signoz.io/signoz/pkg/query-service/integrations/alertManager.(*Notifier).sendAll.func1>
        /go/src/github.com/signoz/signoz/pkg/query-service/integrations/alertManager/notifier.go:232
2023-03-03T09:10:36.514Z        WARN    alertManager/notifier.go:136    msg: dropped alerts      count:2
<http://go.signoz.io/signoz/pkg/query-service/integrations/alertManager.(*Notifier).Run|go.signoz.io/signoz/pkg/query-service/integrations/alertManager.(*Notifier).Run>
a
can you pls share log of alert manager
same liek query service you will find a container for alert manager
h
Copy code
level=error ts=2023-03-03T09:33:36.516Z caller=api.go:808 component=api version=v1 msg="API error" err="bad_data: \"divide(multiply(_subquery310, 100.), _subquery311)\" is not a valid label name"
I completely rewrote the query
Copy code
SELECT
subq1.interval,
subq1.value * 100.0 / subq2.value AS success_rate
FROM (
SELECT
toStartOfInterval(timestamp, INTERVAL 2 MINUTE) AS interval,
COUNT() AS value
FROM signoz_traces.signoz_index_v2
WHERE
httpCode >= '100' AND
httpCode <= '499' AND
timestamp BETWEEN {{.start_datetime}} AND {{.end_datetime}}
GROUP BY interval
) AS subq1
JOIN (
SELECT
toStartOfInterval(timestamp, INTERVAL 2 MINUTE) AS interval,
COUNT() AS value
FROM signoz_traces.signoz_index_v2
WHERE
httpCode >= '100' AND
httpCode <= '599' AND
timestamp BETWEEN {{.start_datetime}} AND {{.end_datetime}}
GROUP BY interval
) AS subq2
ON subq1.interval = subq2.interval;
now after updating it is still firing the alert I've attached the necessary screencaps
it says the value went down to 0 but as you can see in graph the value stays at 100 does not fluctuate
a
can try running the query from log in the clickhouse cli
in your alert definition can you change the option in total to at least once
h
alert fired even after adding atleast once, trying cli rn
Copy code
Error: No such container: clickhouse-setup_clickhouse_1
getting this edit: ok so container name given in the docs is wrong now I'm able to access cli edit2: it needs values in place of {{.start_datetime}} and {{.end_datetime}}
a
the query must have a value alias
h
It does
a
also the interval must be alias as ‘ts’
h
I'm getting the graph without that know?
a
yeah interval is fine alias
did you get results in clickhouse cli
h
Sorry I logged off rn I'll be able to send you the results tomorrow