https://signoz.io logo
a

Anil Kumar Bandrapalli

07/14/2022, 10:40 AM
Alert script :- alert: High Latency of operation ProcessStart expr: histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="/api/task/complete"}[1m])) by (le)) > 100 for: 0m labels: severity: critical annotations: summary: High Latency of operation ProcessStart in Workflow Service description: "Latency is > 200 VALUE = {{ $value }} LABELS = {{ $labels }}"
a

Ankit Nayan

07/14/2022, 11:16 AM
try changing the interval to
2m
or
5m
in expr
Copy code
histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="/api/task/complete"}[5m])) by (le)) > 100
a

Anil Kumar Bandrapalli

07/14/2022, 11:22 AM
ok
i will try this and let you you know
👍 1
its not working
any suggestions ?
a

Ankit Nayan

07/14/2022, 1:15 PM
this should be working
which version of signoz are you using?
a

Anil Kumar Bandrapalli

07/14/2022, 1:15 PM
v0.8.1
actually my scenario is when ever a http request taking more time i need to trigger that email
a

Ankit Nayan

07/14/2022, 1:22 PM
ahhh...I think you need to upgrade to
v0.9.2
and follow migration docs to do that
I remember an issue with alerts being sent to channels in earlier versions
though we have a new release coming in a day or two that will make setting alerts seamless using charts and builders..a sneak peek
p

Priyansh

07/14/2022, 2:32 PM
glad there is a threshold limit line now 😅 which I was just mentioned in yesterdays query builder session. Kudos 🚀
r

Rahul Tiwari

07/15/2022, 5:59 AM
@Ankit Nayan we are getting below error while migrating signoz from 0.8.1 to 0.9
[ec2-user@ip-10-0-4-191 ~]$ kubectl -n platform run -i -t signoz-migrate-clickhouse --image=signoz/migrate:0.9-clickhouse \
-- -host=my-release-clickhouse -port=9000 -userName=admin -password=27ff0399-0d3a-4bd8-919d-17c2181e6fb9
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead. If you don't see a command prompt, try pressing enter. Writing samples to DB 2022/07/15 054409 Error while writing samples to DB code: 60, message: Table signoz_metrics.samples_v2 doesn't exist Session ended, resume using 'kubectl attach signoz-migrate-clickhouse-56767c457-sqpl2 -c signoz-migrate-clickhouse -i -t' command when the pod is running [ec2-user@ip-10-0-4-191 ~]$ [ec2-user@ip-10-0-4-191 ~]$ kubectl logs signoz-migrate-clickhouse-56767c457-kfmdj -n platform -f my-release-clickhouse 9000 admin 27ff0399-0d3a-4bd8-919d-17c2181e6fb9 signoz_metrics Total Rows: 63262424 There are total 63262424 samples rows, starting migration... Total Rows: 2555 There are total 2555 time series rows, starting migration... Writing samples to DB 2022/07/15 055808 Error while writing samples to DB code: 60, message: Table signoz_metrics.samples_v2 doesn't exist [ec2-user@ip-10-0-4-191 ~]$
Can anyone help me on this.
a

Anil Kumar Bandrapalli

07/15/2022, 10:43 AM
Hi @Ankit Nayan, we upgraded to 9.2 but we can't trigger any alerts.
the same issue only
a

Ankit Nayan

07/15/2022, 11:44 AM
@Amol Umbark possible to look into this?
a

Amol Umbark

07/15/2022, 11:44 AM
yep on it
👍 1
a

Anil Kumar Bandrapalli

07/15/2022, 11:48 AM
@Amol Umbark FYI,my requirement is very simple, one specific api request time crosses 100ms i need to send an email
Also where can we get full list of metrics like signoz_latency_bucket? @Ankit Nayan you are referring to new release right when it will be released ?
a

Amol Umbark

07/15/2022, 11:52 AM
@Anil Kumar Bandrapalli are you facing issue with this particular alert ‘High Latency of operation ProcessStart’ or all the alerts? Can you please share log of alert manager and query service? Do you see any alerts in triggered alerts when condition is met? If you do then we should focus on getting channel setup right. I am assuming your channel is working correctly (?). if you are not sure then please go to settings>>channels, pick a channel to edit and click Test. See if you receive a test message. Also, please try setting up a simple alert (may be system_cpu_load_average_15m > 0.15 ) and test that alert setup works.
a

Anil Kumar Bandrapalli

07/15/2022, 12:27 PM
@Amol Umbark i have tested my channel via test button. i am able to receive the mail. i will set up the simple alert that you mentioned
@Amol Umbark i am able to receive the alert for the sample one which you mentioned
can you kindly look into this what went wrong with this below code alert: High Latency of operation ProcessStart expr: histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="/api/task/complete"}[1m])) by (le)) > 50 for: 1m labels: severity: critical annotations: summary: High Latency of operation ProcessStart in Workflow Service description: "Latency is > 200 VALUE = {{ $value }} LABELS = {{ $labels }}"
This is the alert manager log level=info ts=2022-07-15T092148.402Z caller=main.go:237 msg="Starting Alertmanager" version="(version=0.23.0, branch=release/v0.23.0-0.1, revision=6f8c41aa660a379880af00d7b42fd8ed8af854bd)" level=info ts=2022-07-15T092148.403Z caller=main.go:238 build_context="(go=go1.18, user=ubuntu@ip-172-31-87-228, date=20220503-105046)" level=info ts=2022-07-15T092148.405Z caller=cluster.go:184 component=cluster msg="setting advertise address explicitly" addr=10.0.1.11 port=9094 level=info ts=2022-07-15T092148.407Z caller=cluster.go:679 component=cluster msg="Waiting for gossip to settle..." interval=2s level=info ts=2022-07-15T092148.702Z caller=coordinator.go:141 component=configuration msg="Loading a new configuration" level=warn ts=2022-07-15T092148.718Z caller=configLoader.go:61 component=configuration msg="No channels found in query service " level=info ts=2022-07-15T092148.718Z caller=coordinator.go:156 component=configuration msg="Completed loading of configuration file" RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} level=info ts=2022-07-15T092148.725Z caller=main.go:570 msg=Listening address=:9093 level=info ts=2022-07-15T092148.726Z caller=tls_config.go:191 msg="TLS is disabled." http2=false level=info ts=2022-07-15T092150.408Z caller=cluster.go:704 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000953591s level=info ts=2022-07-15T092158.413Z caller=cluster.go:696 component=cluster msg="gossip settled; proceeding" elapsed=10.006777612s RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {High Transaction Time Alert map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {High Transaction Time Alert map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {High Transaction Time Alert map[alertname:{}] false 30s 5m0s 4h0m0s []}
a

Ankit Nayan

07/15/2022, 12:56 PM
@Anil Kumar Bandrapalli are you able to plot this query in any sample dashboard panel? Does the chart show anything?
Copy code
histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="/api/task/complete"}[1m])) by (le))
a

Anil Kumar Bandrapalli

07/15/2022, 1:00 PM
yes promptQL showing no data.
a

Ankit Nayan

07/15/2022, 1:01 PM
now change it to
[2m]
do you see a chart now?
a

Anil Kumar Bandrapalli

07/15/2022, 1:06 PM
nope empty dash board
a

Ankit Nayan

07/15/2022, 1:09 PM
so you do not have the data to set an alert on
a

Anil Kumar Bandrapalli

07/15/2022, 1:10 PM
let me try this way . i will ignite a test for 15 mins and then i will check whether some data is populating or not
👍 1
a

Ankit Nayan

07/15/2022, 1:11 PM
are you using docker installation on 1 VM or k8s? you should
exec -it
into your clickhouse container and connect to db by running
clickhouse client
inside the container
a

Anil Kumar Bandrapalli

07/15/2022, 1:12 PM
we are running k8s i will do that
a

Ankit Nayan

07/15/2022, 1:13 PM
then
use signoz_metrics;
and
Copy code
select * from time_series_v2 where metric_name='signoz_latency_bucket';
and try to search for rows which has
workflow-service
and
/api/task/complete
unless you see a chart with the above query plotting with
2m
time range..your alert won't work
a

Anil Kumar Bandrapalli

07/15/2022, 1:40 PM
when i log into clickhouse container and did executed this command curl -fO "https://packages.clickhouse.com/tgz/stable/clickhouse-client-22.6.3.35-amd64.tgz"
but showing permission denied
when i run clickhouse client command showing error command not found
a

Amol Umbark

07/15/2022, 1:42 PM
there must be a client already in the container.. try
clickhouse client --host localhost --port 9000
a

Ankit Nayan

07/15/2022, 2:14 PM
@Prashant Shahi how can a user connect to clickhouse db in k8s?
p

Prashant Shahi

07/15/2022, 3:16 PM
Follow the commands below to connect to clickhouse pod:
Copy code
kubectl -n platform exec -i --tty pod/chi-signoz-cluster-0-0-0 -- bash
Followed by:
Copy code
clickhouse-client
a

Anil Kumar Bandrapalli

07/15/2022, 4:09 PM
Hi @Ankit Nayan it is working . in the query i have modified the operation value to POST /api/task/complete
then it is firing alerts
a

Ankit Nayan

07/15/2022, 4:32 PM
Cool 👍 the name needs to be an exact match
a

Anil Kumar Bandrapalli

07/15/2022, 4:38 PM
yes that is i got to know when looking into db only. thanks a lot for helping me out to resolve this issue. I am excited to see to new version with that you have mentioned
a

Ankit Nayan

07/15/2022, 4:51 PM
releasing this hour..would be great if you can try when you get time
a

Anil Kumar Bandrapalli

07/15/2022, 5:08 PM
sure
a

Ankit Nayan

07/16/2022, 10:56 AM
@Anil Kumar Bandrapalli https://github.com/SigNoz/signoz/releases/tag/v0.10.0 migration docs - https://signoz.io/docs/operate/migration/upgrade-0.10/ Let me know if you face any issues in the new alerts UI
r

Rahul Tiwari

07/18/2022, 6:10 AM
@Ankit Nayan and @Prashant Shahi am getting below error while upgrading signoz 0.9.2to 0.10
[ec2-user@ip-10-0-4-191 ~]$ k get pods -n platform NAME READY STATUS RESTARTS AGE chi-signoz-cluster-0-0-0 1/1 Running 0 2d20h clickhouse-operator-787f8989cd-kr52v 2/2 Running 0 2d20h my-release-signoz-alertmanager-0 1/1 Running 0 2d20h my-release-signoz-frontend-68b56fc4b8-zg6hl 1/1 Running 0 2d20h my-release-signoz-otel-collector-57d668b84c-zcbr5 1/1 Running 0 2d20h my-release-signoz-otel-collector-metrics-59556558b5-7gks2 1/1 Running 0 2d20h my-release-signoz-query-service-0 1/1 Running 0 2d20h my-release-zookeeper-0 1/1 Running 0 2d20h signoz-migrate-846b558f6-s6bdg 0/1 CrashLoopBackOff 7 13m [ec2-user@ip-10-0-4-191 ~]$ k logs signoz-migrate-846b558f6-s6bdg -n platform my-release-clickhouse 9000 admin 27ff0399-0d3a-4bd8-919d-17c2181e6fb9 No TTL found, skipping TTL migration No data found in clickhouse [ec2-user@ip-10-0-4-191 ~]$
p

Prashant Shahi

07/18/2022, 6:27 AM
My guess is that you migration script was already ran once.. You can delete the pod.
@Vishal Sharma if migration script was already ran, we should have exited with 0 status code
v

Vishal Sharma

07/18/2022, 6:33 AM
@Prashant Shahi I see that there was no data in exceptions table so data was not found. @Rahul Tiwari Do you use exceptions feature? https://signoz.io/docs/userguide/exceptions/#viewing-exceptions
r

Rahul Tiwari

07/18/2022, 6:53 AM
i have attached the screen shot.
v

Vishal Sharma

07/18/2022, 6:54 AM
Then it’s fine, the migration script ran successfully as you are not using exceptions feature.
r

Rahul Tiwari

07/18/2022, 6:57 AM
@Vishal Sharma and @Prashant Shahi the signoz-migrate pod is going into crashloopbackoff state, with below error.
[ec2-user@ip-10-0-4-191 ~]$ k get pods -n platform NAME READY STATUS RESTARTS AGE chi-signoz-cluster-0-0-0 1/1 Running 0 2d21h clickhouse-operator-787f8989cd-kr52v 2/2 Running 0 2d21h my-release-signoz-alertmanager-0 1/1 Running 0 2d21h my-release-signoz-frontend-68b56fc4b8-zg6hl 1/1 Running 0 2d21h my-release-signoz-otel-collector-57d668b84c-zcbr5 1/1 Running 0 2d21h my-release-signoz-otel-collector-metrics-59556558b5-7gks2 1/1 Running 0 2d21h my-release-signoz-query-service-0 1/1 Running 0 2d21h my-release-zookeeper-0 1/1 Running 0 2d21h signoz-migrate-846b558f6-s6bdg 0/1 CrashLoopBackOff 16 61m [ec2-user@ip-10-0-4-191 ~]$ k logs signoz-migrate-846b558f6-s6bdg -n platform my-release-clickhouse 9000 admin 27ff0399-0d3a-4bd8-919d-17c2181e6fb9 No TTL found, skipping TTL migration No data found in clickhouse [ec2-user@ip-10-0-4-191 ~]$
v

Vishal Sharma

07/18/2022, 6:58 AM
@Rahul Tiwari You can delete migration pods with this command:
kubectl -n platform delete pod signoz-migrate
r

Rahul Tiwari

07/18/2022, 7:01 AM
@Vishal Sharma i tried deleting it but it is again giving the same error.
[ec2-user@ip-10-0-4-191 ~]$ k get pods -n platform NAME READY STATUS RESTARTS AGE chi-signoz-cluster-0-0-0 1/1 Running 0 2d21h clickhouse-operator-787f8989cd-kr52v 2/2 Running 0 2d21h my-release-signoz-alertmanager-0 1/1 Running 0 2d21h my-release-signoz-frontend-68b56fc4b8-zg6hl 1/1 Running 0 2d21h my-release-signoz-otel-collector-57d668b84c-zcbr5 1/1 Running 0 2d21h my-release-signoz-otel-collector-metrics-59556558b5-7gks2 1/1 Running 0 2d21h my-release-signoz-query-service-0 1/1 Running 0 2d21h my-release-zookeeper-0 1/1 Running 0 2d21h signoz-migrate-846b558f6-p6rtb 0/1 CrashLoopBackOff 3 81s [ec2-user@ip-10-0-4-191 ~]$ k logs signoz-migrate-846b558f6-p6rtb -n platform my-release-clickhouse 9000 admin 27ff0399-0d3a-4bd8-919d-17c2181e6fb9 No TTL found, skipping TTL migration No data found in clickhouse [ec2-user@ip-10-0-4-191 ~]$
@Vishal Sharma and @Prashant Shahi i have completely uninstall signoz ver.9.1 and install 10.0. Thank you for your support
👍 1
a

Anil Kumar Bandrapalli

07/18/2022, 1:08 PM
@Ankit Nayan , in the PromQL we have given this query "histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="POST /api/task/complete"}[1m])) by (le)) > 50". When saving we got error "at least one metric condition is required". Previously same query used to work. Could you please help to solve this issue ? Also we tried to create a query using query builder but how this function "histogram_quantile" can be added to that query in the query builder?
a

Amol Umbark

07/18/2022, 1:11 PM
@Anil Kumar Bandrapalli when saving the rule you need to keep the promql tab active. on saving you would also notice a message which says the query will be saved with promql expression instead of query builder . can you please do this
a

Anil Kumar Bandrapalli

07/18/2022, 1:13 PM
yes i did the same thing but still getting same error
a

Amol Umbark

07/18/2022, 1:14 PM
I will try to reproduce this but meanwhile can you please create a new alert rule and proceed.
the issue could be result of switching from promql to query builder
a

Anil Kumar Bandrapalli

07/18/2022, 1:14 PM
sure
a

Amol Umbark

07/18/2022, 1:15 PM
also try to input just the metric query in the promql expression so the graph can be plotted. once your graph looks good the add the threshold in the second step
a

Anil Kumar Bandrapalli

07/18/2022, 1:15 PM
for the fresh alert showing error "metric name is missing in A"
but i am in PromQL tab only
a

Amol Umbark

07/18/2022, 1:17 PM
that's unexpected. let me review and get back
👍 1
but are you able to plot the graph for promql query
a

Anil Kumar Bandrapalli

07/18/2022, 1:23 PM
nope.
we cant able to save it na
with query builder we are able to see the graph
a

Amol Umbark

07/18/2022, 1:25 PM
to see graph there is no need to save
can you pls share a screenshot of your alert
a

Anil Kumar Bandrapalli

07/18/2022, 1:27 PM
sorry the graph is showing
a

Amol Umbark

07/18/2022, 1:27 PM
ok great let me get back on the save issue
a

Anil Kumar Bandrapalli

07/18/2022, 1:27 PM
But could not be able to save that alert
a

Amol Umbark

07/18/2022, 1:28 PM
got it
can you try selecting a metric name in query builder but keep the promql tab active right before you save
select a random metric ..should not matter
a

Anil Kumar Bandrapalli

07/18/2022, 1:32 PM
ok
i am able to save the alert
a

Amol Umbark

07/18/2022, 1:37 PM
cool i will resolve the issue of metric name error
a

Anil Kumar Bandrapalli

07/18/2022, 1:37 PM
OK
a

Ankit Nayan

07/18/2022, 3:54 PM
@Anil Kumar Bandrapalli
Copy code
histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="POST /api/task/complete"}[1m])) by (le))
try changing the
[1m]
to
[5m]
. Does the chart plot now?
a

Anil Kumar Bandrapalli

07/18/2022, 5:48 PM
@Ankit Nayan it is working fine now. we are able to receive alerts . I have one more question. Do we have integration with camunda platform ?
a

Ankit Nayan

07/18/2022, 6:14 PM
never heard of camunda...what do you want to do by the integration, I am curious!
a

Anil Kumar Bandrapalli

07/18/2022, 6:16 PM
we would like to integrate signoz into camunda platform to see the metrics and set the alerts
a

Ankit Nayan

07/18/2022, 6:19 PM
does camunda support webhook receiver..you can use webhook channel at signoz to send any alert to any webhook integration platform like zapier
a

Anil Kumar Bandrapalli

07/18/2022, 6:20 PM
you can get more info from this link https://camunda.com/ we are actually working on workflows, like in process flow in jira
apart from alerts can we integrate this and get the metrics like how we are able to see p99,top endpoints like that
Hi @Ankit Nayan, we are able to integrate signoz with tomcat java application which is using mysql as DB. Now we are able to see the DB calls and traces as well. But we are seeing question mark (?) in the db.statement . can we can get exact value what is being passed to that query ?
a

Ankit Nayan

07/25/2022, 9:20 AM
I am afraid, I have not seen anybody using like that. @Srikanth Chekuri do you have any idea if this can be enabled soemwhere?
s

Srikanth Chekuri

07/25/2022, 10:16 AM
@Anil Kumar Bandrapalli The question marks will remain in the statement but there should be a optional flag to capture the params but the java instrumentation doesn't support it yet https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/400.
15 Views