Alert script :- alert: High Latency of operation ...
# support
a
Alert script :- alert: High Latency of operation ProcessStart expr: histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="/api/task/complete"}[1m])) by (le)) > 100 for: 0m labels: severity: critical annotations: summary: High Latency of operation ProcessStart in Workflow Service description: "Latency is > 200 VALUE = {{ $value }} LABELS = {{ $labels }}"
a
try changing the interval to
2m
or
5m
in expr
Copy code
histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="/api/task/complete"}[5m])) by (le)) > 100
a
ok
i will try this and let you you know
👍 1
its not working
any suggestions ?
a
this should be working
which version of signoz are you using?
a
v0.8.1
actually my scenario is when ever a http request taking more time i need to trigger that email
a
ahhh...I think you need to upgrade to
v0.9.2
and follow migration docs to do that
I remember an issue with alerts being sent to channels in earlier versions
though we have a new release coming in a day or two that will make setting alerts seamless using charts and builders..a sneak peek
p
glad there is a threshold limit line now 😅 which I was just mentioned in yesterdays query builder session. Kudos 🚀
r
@Ankit Nayan we are getting below error while migrating signoz from 0.8.1 to 0.9
[ec2-user@ip-10-0-4-191 ~]$ kubectl -n platform run -i -t signoz-migrate-clickhouse --image=signoz/migrate:0.9-clickhouse \
-- -host=my-release-clickhouse -port=9000 -userName=admin -password=27ff0399-0d3a-4bd8-919d-17c2181e6fb9
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead. If you don't see a command prompt, try pressing enter. Writing samples to DB 2022/07/15 054409 Error while writing samples to DB code: 60, message: Table signoz_metrics.samples_v2 doesn't exist Session ended, resume using 'kubectl attach signoz-migrate-clickhouse-56767c457-sqpl2 -c signoz-migrate-clickhouse -i -t' command when the pod is running [ec2-user@ip-10-0-4-191 ~]$ [ec2-user@ip-10-0-4-191 ~]$ kubectl logs signoz-migrate-clickhouse-56767c457-kfmdj -n platform -f my-release-clickhouse 9000 admin 27ff0399-0d3a-4bd8-919d-17c2181e6fb9 signoz_metrics Total Rows: 63262424 There are total 63262424 samples rows, starting migration... Total Rows: 2555 There are total 2555 time series rows, starting migration... Writing samples to DB 2022/07/15 055808 Error while writing samples to DB code: 60, message: Table signoz_metrics.samples_v2 doesn't exist [ec2-user@ip-10-0-4-191 ~]$
Can anyone help me on this.
a
Hi @Ankit Nayan, we upgraded to 9.2 but we can't trigger any alerts.
the same issue only
a
@Amol Umbark possible to look into this?
a
yep on it
👍 1
a
@Amol Umbark FYI,my requirement is very simple, one specific api request time crosses 100ms i need to send an email
Also where can we get full list of metrics like signoz_latency_bucket? @Ankit Nayan you are referring to new release right when it will be released ?
a
@Anil Kumar Bandrapalli are you facing issue with this particular alert ‘High Latency of operation ProcessStart’ or all the alerts? Can you please share log of alert manager and query service? Do you see any alerts in triggered alerts when condition is met? If you do then we should focus on getting channel setup right. I am assuming your channel is working correctly (?). if you are not sure then please go to settings>>channels, pick a channel to edit and click Test. See if you receive a test message. Also, please try setting up a simple alert (may be system_cpu_load_average_15m > 0.15 ) and test that alert setup works.
a
@Amol Umbark i have tested my channel via test button. i am able to receive the mail. i will set up the simple alert that you mentioned
@Amol Umbark i am able to receive the alert for the sample one which you mentioned
can you kindly look into this what went wrong with this below code alert: High Latency of operation ProcessStart expr: histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="/api/task/complete"}[1m])) by (le)) > 50 for: 1m labels: severity: critical annotations: summary: High Latency of operation ProcessStart in Workflow Service description: "Latency is > 200 VALUE = {{ $value }} LABELS = {{ $labels }}"
This is the alert manager log level=info ts=2022-07-15T092148.402Z caller=main.go:237 msg="Starting Alertmanager" version="(version=0.23.0, branch=release/v0.23.0-0.1, revision=6f8c41aa660a379880af00d7b42fd8ed8af854bd)" level=info ts=2022-07-15T092148.403Z caller=main.go:238 build_context="(go=go1.18, user=ubuntu@ip-172-31-87-228, date=20220503-105046)" level=info ts=2022-07-15T092148.405Z caller=cluster.go:184 component=cluster msg="setting advertise address explicitly" addr=10.0.1.11 port=9094 level=info ts=2022-07-15T092148.407Z caller=cluster.go:679 component=cluster msg="Waiting for gossip to settle..." interval=2s level=info ts=2022-07-15T092148.702Z caller=coordinator.go:141 component=configuration msg="Loading a new configuration" level=warn ts=2022-07-15T092148.718Z caller=configLoader.go:61 component=configuration msg="No channels found in query service " level=info ts=2022-07-15T092148.718Z caller=coordinator.go:156 component=configuration msg="Completed loading of configuration file" RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} level=info ts=2022-07-15T092148.725Z caller=main.go:570 msg=Listening address=:9093 level=info ts=2022-07-15T092148.726Z caller=tls_config.go:191 msg="TLS is disabled." http2=false level=info ts=2022-07-15T092150.408Z caller=cluster.go:704 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000953591s level=info ts=2022-07-15T092158.413Z caller=cluster.go:696 component=cluster msg="gossip settled; proceeding" elapsed=10.006777612s RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {High Transaction Time Alert map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {High Transaction Time Alert map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {High Transaction Time Alert map[alertname:{}] false 30s 5m0s 4h0m0s []}
a
@Anil Kumar Bandrapalli are you able to plot this query in any sample dashboard panel? Does the chart show anything?
Copy code
histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="/api/task/complete"}[1m])) by (le))
a
yes promptQL showing no data.
a
now change it to
[2m]
do you see a chart now?
a
nope empty dash board
a
so you do not have the data to set an alert on
a
let me try this way . i will ignite a test for 15 mins and then i will check whether some data is populating or not
👍 1
a
are you using docker installation on 1 VM or k8s? you should
exec -it
into your clickhouse container and connect to db by running
clickhouse client
inside the container
a
we are running k8s i will do that
a
then
use signoz_metrics;
and
Copy code
select * from time_series_v2 where metric_name='signoz_latency_bucket';
and try to search for rows which has
workflow-service
and
/api/task/complete
unless you see a chart with the above query plotting with
2m
time range..your alert won't work
a
when i log into clickhouse container and did executed this command curl -fO "https://packages.clickhouse.com/tgz/stable/clickhouse-client-22.6.3.35-amd64.tgz"
but showing permission denied
when i run clickhouse client command showing error command not found
a
there must be a client already in the container.. try
clickhouse client --host localhost --port 9000
a
@Prashant Shahi how can a user connect to clickhouse db in k8s?
p
Follow the commands below to connect to clickhouse pod:
Copy code
kubectl -n platform exec -i --tty pod/chi-signoz-cluster-0-0-0 -- bash
Followed by:
Copy code
clickhouse-client
a
Hi @Ankit Nayan it is working . in the query i have modified the operation value to POST /api/task/complete
then it is firing alerts
a
Cool 👍 the name needs to be an exact match
a
yes that is i got to know when looking into db only. thanks a lot for helping me out to resolve this issue. I am excited to see to new version with that you have mentioned
a
releasing this hour..would be great if you can try when you get time
a
sure
a
@Anil Kumar Bandrapalli https://github.com/SigNoz/signoz/releases/tag/v0.10.0 migration docs - https://signoz.io/docs/operate/migration/upgrade-0.10/ Let me know if you face any issues in the new alerts UI
r
@Ankit Nayan and @Prashant Shahi am getting below error while upgrading signoz 0.9.2to 0.10
[ec2-user@ip-10-0-4-191 ~]$ k get pods -n platform NAME READY STATUS RESTARTS AGE chi-signoz-cluster-0-0-0 1/1 Running 0 2d20h clickhouse-operator-787f8989cd-kr52v 2/2 Running 0 2d20h my-release-signoz-alertmanager-0 1/1 Running 0 2d20h my-release-signoz-frontend-68b56fc4b8-zg6hl 1/1 Running 0 2d20h my-release-signoz-otel-collector-57d668b84c-zcbr5 1/1 Running 0 2d20h my-release-signoz-otel-collector-metrics-59556558b5-7gks2 1/1 Running 0 2d20h my-release-signoz-query-service-0 1/1 Running 0 2d20h my-release-zookeeper-0 1/1 Running 0 2d20h signoz-migrate-846b558f6-s6bdg 0/1 CrashLoopBackOff 7 13m [ec2-user@ip-10-0-4-191 ~]$ k logs signoz-migrate-846b558f6-s6bdg -n platform my-release-clickhouse 9000 admin 27ff0399-0d3a-4bd8-919d-17c2181e6fb9 No TTL found, skipping TTL migration No data found in clickhouse [ec2-user@ip-10-0-4-191 ~]$
p
My guess is that you migration script was already ran once.. You can delete the pod.
@Vishal Sharma if migration script was already ran, we should have exited with 0 status code
v
@Prashant Shahi I see that there was no data in exceptions table so data was not found. @Rahul Tiwari Do you use exceptions feature? https://signoz.io/docs/userguide/exceptions/#viewing-exceptions
r
i have attached the screen shot.
v
Then it’s fine, the migration script ran successfully as you are not using exceptions feature.
r
@Vishal Sharma and @Prashant Shahi the signoz-migrate pod is going into crashloopbackoff state, with below error.
[ec2-user@ip-10-0-4-191 ~]$ k get pods -n platform NAME READY STATUS RESTARTS AGE chi-signoz-cluster-0-0-0 1/1 Running 0 2d21h clickhouse-operator-787f8989cd-kr52v 2/2 Running 0 2d21h my-release-signoz-alertmanager-0 1/1 Running 0 2d21h my-release-signoz-frontend-68b56fc4b8-zg6hl 1/1 Running 0 2d21h my-release-signoz-otel-collector-57d668b84c-zcbr5 1/1 Running 0 2d21h my-release-signoz-otel-collector-metrics-59556558b5-7gks2 1/1 Running 0 2d21h my-release-signoz-query-service-0 1/1 Running 0 2d21h my-release-zookeeper-0 1/1 Running 0 2d21h signoz-migrate-846b558f6-s6bdg 0/1 CrashLoopBackOff 16 61m [ec2-user@ip-10-0-4-191 ~]$ k logs signoz-migrate-846b558f6-s6bdg -n platform my-release-clickhouse 9000 admin 27ff0399-0d3a-4bd8-919d-17c2181e6fb9 No TTL found, skipping TTL migration No data found in clickhouse [ec2-user@ip-10-0-4-191 ~]$
v
@Rahul Tiwari You can delete migration pods with this command:
kubectl -n platform delete pod signoz-migrate
r
@Vishal Sharma i tried deleting it but it is again giving the same error.
[ec2-user@ip-10-0-4-191 ~]$ k get pods -n platform NAME READY STATUS RESTARTS AGE chi-signoz-cluster-0-0-0 1/1 Running 0 2d21h clickhouse-operator-787f8989cd-kr52v 2/2 Running 0 2d21h my-release-signoz-alertmanager-0 1/1 Running 0 2d21h my-release-signoz-frontend-68b56fc4b8-zg6hl 1/1 Running 0 2d21h my-release-signoz-otel-collector-57d668b84c-zcbr5 1/1 Running 0 2d21h my-release-signoz-otel-collector-metrics-59556558b5-7gks2 1/1 Running 0 2d21h my-release-signoz-query-service-0 1/1 Running 0 2d21h my-release-zookeeper-0 1/1 Running 0 2d21h signoz-migrate-846b558f6-p6rtb 0/1 CrashLoopBackOff 3 81s [ec2-user@ip-10-0-4-191 ~]$ k logs signoz-migrate-846b558f6-p6rtb -n platform my-release-clickhouse 9000 admin 27ff0399-0d3a-4bd8-919d-17c2181e6fb9 No TTL found, skipping TTL migration No data found in clickhouse [ec2-user@ip-10-0-4-191 ~]$
@Vishal Sharma and @Prashant Shahi i have completely uninstall signoz ver.9.1 and install 10.0. Thank you for your support
👍 1
a
@Ankit Nayan , in the PromQL we have given this query "histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="POST /api/task/complete"}[1m])) by (le)) > 50". When saving we got error "at least one metric condition is required". Previously same query used to work. Could you please help to solve this issue ? Also we tried to create a query using query builder but how this function "histogram_quantile" can be added to that query in the query builder?
a
@Anil Kumar Bandrapalli when saving the rule you need to keep the promql tab active. on saving you would also notice a message which says the query will be saved with promql expression instead of query builder . can you please do this
a
yes i did the same thing but still getting same error
a
I will try to reproduce this but meanwhile can you please create a new alert rule and proceed.
the issue could be result of switching from promql to query builder
a
sure
a
also try to input just the metric query in the promql expression so the graph can be plotted. once your graph looks good the add the threshold in the second step
a
for the fresh alert showing error "metric name is missing in A"
but i am in PromQL tab only
a
that's unexpected. let me review and get back
👍 1
but are you able to plot the graph for promql query
a
nope.
we cant able to save it na
with query builder we are able to see the graph
a
to see graph there is no need to save
can you pls share a screenshot of your alert
a
sorry the graph is showing
a
ok great let me get back on the save issue
a
But could not be able to save that alert
a
got it
can you try selecting a metric name in query builder but keep the promql tab active right before you save
select a random metric ..should not matter
a
ok
i am able to save the alert
a
cool i will resolve the issue of metric name error
a
OK
a
@Anil Kumar Bandrapalli
Copy code
histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="POST /api/task/complete"}[1m])) by (le))
try changing the
[1m]
to
[5m]
. Does the chart plot now?
a
@Ankit Nayan it is working fine now. we are able to receive alerts . I have one more question. Do we have integration with camunda platform ?
a
never heard of camunda...what do you want to do by the integration, I am curious!
a
we would like to integrate signoz into camunda platform to see the metrics and set the alerts
a
does camunda support webhook receiver..you can use webhook channel at signoz to send any alert to any webhook integration platform like zapier
a
you can get more info from this link https://camunda.com/ we are actually working on workflows, like in process flow in jira
apart from alerts can we integrate this and get the metrics like how we are able to see p99,top endpoints like that
Hi @Ankit Nayan, we are able to integrate signoz with tomcat java application which is using mysql as DB. Now we are able to see the DB calls and traces as well. But we are seeing question mark (?) in the db.statement . can we can get exact value what is being passed to that query ?
a
I am afraid, I have not seen anybody using like that. @Srikanth Chekuri do you have any idea if this can be enabled soemwhere?
s
@Anil Kumar Bandrapalli The question marks will remain in the statement but there should be a optional flag to capture the params but the java instrumentation doesn't support it yet https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/400.