Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    Alert script :- alert: High Latency of operation ProcessStart expr: histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="/api/task/complete"}[1m])) by (le)) > 100 for: 0m labels: severity: critical annotations: summary: High Latency of operation ProcessStart in Workflow Service description: "Latency is > 200 VALUE = {{ $value }} LABELS = {{ $labels }}"
    Ankit Nayan

    Ankit Nayan

    2 months ago
    try changing the interval to
    2m
    or
    5m
    in expr
    histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="/api/task/complete"}[5m])) by (le)) > 100
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    ok
    i will try this and let you you know
    its not working
    any suggestions ?
    Ankit Nayan

    Ankit Nayan

    2 months ago
    this should be working
    which version of signoz are you using?
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    v0.8.1
    actually my scenario is when ever a http request taking more time i need to trigger that email
    Ankit Nayan

    Ankit Nayan

    2 months ago
    ahhh...I think you need to upgrade to
    v0.9.2
    and follow migration docs to do that
    I remember an issue with alerts being sent to channels in earlier versions
    though we have a new release coming in a day or two that will make setting alerts seamless using charts and builders..a sneak peek
    Priyansh

    Priyansh

    2 months ago
    glad there is a threshold limit line now 😅 which I was just mentioned in yesterdays query builder session. Kudos 🚀
    Rahul Tiwari

    Rahul Tiwari

    2 months ago
    @Ankit Nayan we are getting below error while migrating signoz from 0.8.1 to 0.9
    [ec2-user@ip-10-0-4-191 ~]$ kubectl -n platform run -i -t signoz-migrate-clickhouse --image=signoz/migrate:0.9-clickhouse \
    -- -host=my-release-clickhouse -port=9000 -userName=admin -password=27ff0399-0d3a-4bd8-919d-17c2181e6fb9
    kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead. If you don't see a command prompt, try pressing enter. Writing samples to DB 2022/07/15 05:44:09 Error while writing samples to DB code: 60, message: Table signoz_metrics.samples_v2 doesn't exist Session ended, resume using 'kubectl attach signoz-migrate-clickhouse-56767c457-sqpl2 -c signoz-migrate-clickhouse -i -t' command when the pod is running [ec2-user@ip-10-0-4-191 ~]$ [ec2-user@ip-10-0-4-191 ~]$ kubectl logs signoz-migrate-clickhouse-56767c457-kfmdj -n platform -f my-release-clickhouse 9000 admin 27ff0399-0d3a-4bd8-919d-17c2181e6fb9 signoz_metrics Total Rows: 63262424 There are total 63262424 samples rows, starting migration... Total Rows: 2555 There are total 2555 time series rows, starting migration... Writing samples to DB 2022/07/15 05:58:08 Error while writing samples to DB code: 60, message: Table signoz_metrics.samples_v2 doesn't exist [ec2-user@ip-10-0-4-191 ~]$
    Can anyone help me on this.
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    Hi @Ankit Nayan, we upgraded to 9.2 but we can't trigger any alerts.
    the same issue only
    Ankit Nayan

    Ankit Nayan

    2 months ago
    @Amol Umbark possible to look into this?
    Amol Umbark

    Amol Umbark

    2 months ago
    yep on it
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    @Amol Umbark FYI,my requirement is very simple, one specific api request time crosses 100ms i need to send an email
    Also where can we get full list of metrics like signoz_latency_bucket? @Ankit Nayan you are referring to new release right when it will be released ?
    Amol Umbark

    Amol Umbark

    2 months ago
    @Anil Kumar Bandrapalli are you facing issue with this particular alert ‘High Latency of operation ProcessStart’ or all the alerts? Can you please share log of alert manager and query service? Do you see any alerts in triggered alerts when condition is met? If you do then we should focus on getting channel setup right. I am assuming your channel is working correctly (?). if you are not sure then please go to settings>>channels, pick a channel to edit and click Test. See if you receive a test message. Also, please try setting up a simple alert (may be system_cpu_load_average_15m > 0.15 ) and test that alert setup works.
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    @Amol Umbark i have tested my channel via test button. i am able to receive the mail. i will set up the simple alert that you mentioned
    @Amol Umbark i am able to receive the alert for the sample one which you mentioned
    can you kindly look into this what went wrong with this below code alert: High Latency of operation ProcessStart expr: histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="/api/task/complete"}[1m])) by (le)) > 50 for: 1m labels: severity: critical annotations: summary: High Latency of operation ProcessStart in Workflow Service description: "Latency is > 200 VALUE = {{ $value }} LABELS = {{ $labels }}"
    This is the alert manager log level=info ts=2022-07-15T09:21:48.402Z caller=main.go:237 msg="Starting Alertmanager" version="(version=0.23.0, branch=release/v0.23.0-0.1, revision=6f8c41aa660a379880af00d7b42fd8ed8af854bd)" level=info ts=2022-07-15T09:21:48.403Z caller=main.go:238 build_context="(go=go1.18, user=ubuntu@ip-172-31-87-228, date=20220503-10:50:46)" level=info ts=2022-07-15T09:21:48.405Z caller=cluster.go:184 component=cluster msg="setting advertise address explicitly" addr=10.0.1.11 port=9094 level=info ts=2022-07-15T09:21:48.407Z caller=cluster.go:679 component=cluster msg="Waiting for gossip to settle..." interval=2s level=info ts=2022-07-15T09:21:48.702Z caller=coordinator.go:141 component=configuration msg="Loading a new configuration" level=warn ts=2022-07-15T09:21:48.718Z caller=configLoader.go:61 component=configuration msg="No channels found in query service " level=info ts=2022-07-15T09:21:48.718Z caller=coordinator.go:156 component=configuration msg="Completed loading of configuration file" RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} level=info ts=2022-07-15T09:21:48.725Z caller=main.go:570 msg=Listening address=:9093 level=info ts=2022-07-15T09:21:48.726Z caller=tls_config.go:191 msg="TLS is disabled." http2=false level=info ts=2022-07-15T09:21:50.408Z caller=cluster.go:704 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000953591s level=info ts=2022-07-15T09:21:58.413Z caller=cluster.go:696 component=cluster msg="gossip settled; proceeding" elapsed=10.006777612s RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {High Transaction Time Alert map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {High Transaction Time Alert map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {default-receiver map[alertname:{}] false 30s 5m0s 4h0m0s []} RouteOpts: {High Transaction Time Alert map[alertname:{}] false 30s 5m0s 4h0m0s []}
    Ankit Nayan

    Ankit Nayan

    2 months ago
    @Anil Kumar Bandrapalli are you able to plot this query in any sample dashboard panel? Does the chart show anything?
    histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="/api/task/complete"}[1m])) by (le))
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    yes promptQL showing no data.
    Ankit Nayan

    Ankit Nayan

    2 months ago
    now change it to
    [2m]
    do you see a chart now?
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    nope empty dash board
    Ankit Nayan

    Ankit Nayan

    2 months ago
    so you do not have the data to set an alert on
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    let me try this way . i will ignite a test for 15 mins and then i will check whether some data is populating or not
    Ankit Nayan

    Ankit Nayan

    2 months ago
    are you using docker installation on 1 VM or k8s? you should
    exec -it
    into your clickhouse container and connect to db by running
    clickhouse client
    inside the container
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    we are running k8s i will do that
    Ankit Nayan

    Ankit Nayan

    2 months ago
    then
    use signoz_metrics;
    and
    select * from time_series_v2 where metric_name='signoz_latency_bucket';
    and try to search for rows which has
    workflow-service
    and
    /api/task/complete
    unless you see a chart with the above query plotting with
    2m
    time range..your alert won't work
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    when i log into clickhouse container and did executed this command curl -fO "https://packages.clickhouse.com/tgz/stable/clickhouse-client-22.6.3.35-amd64.tgz"
    but showing permission denied
    when i run clickhouse client command showing error command not found
    Amol Umbark

    Amol Umbark

    2 months ago
    there must be a client already in the container.. try
    clickhouse client --host localhost --port 9000
    Ankit Nayan

    Ankit Nayan

    2 months ago
    @Prashant Shahi how can a user connect to clickhouse db in k8s?
    Prashant Shahi

    Prashant Shahi

    2 months ago
    Follow the commands below to connect to clickhouse pod:
    kubectl -n platform exec -i --tty pod/chi-signoz-cluster-0-0-0 -- bash
    Followed by:
    clickhouse-client
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    Hi @Ankit Nayan it is working . in the query i have modified the operation value to POST /api/task/complete
    then it is firing alerts
    Ankit Nayan

    Ankit Nayan

    2 months ago
    Cool 👍 the name needs to be an exact match
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    yes that is i got to know when looking into db only. thanks a lot for helping me out to resolve this issue. I am excited to see to new version with that you have mentioned
    Ankit Nayan

    Ankit Nayan

    2 months ago
    releasing this hour..would be great if you can try when you get time
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    sure
    Ankit Nayan

    Ankit Nayan

    2 months ago
    @Anil Kumar Bandrapalli https://github.com/SigNoz/signoz/releases/tag/v0.10.0 migration docs - https://signoz.io/docs/operate/migration/upgrade-0.10/ Let me know if you face any issues in the new alerts UI
    Rahul Tiwari

    Rahul Tiwari

    2 months ago
    @Ankit Nayan and @Prashant Shahi am getting below error while upgrading signoz 0.9.2to 0.10
    [ec2-user@ip-10-0-4-191 ~]$ k get pods -n platform NAME READY STATUS RESTARTS AGE chi-signoz-cluster-0-0-0 1/1 Running 0 2d20h clickhouse-operator-787f8989cd-kr52v 2/2 Running 0 2d20h my-release-signoz-alertmanager-0 1/1 Running 0 2d20h my-release-signoz-frontend-68b56fc4b8-zg6hl 1/1 Running 0 2d20h my-release-signoz-otel-collector-57d668b84c-zcbr5 1/1 Running 0 2d20h my-release-signoz-otel-collector-metrics-59556558b5-7gks2 1/1 Running 0 2d20h my-release-signoz-query-service-0 1/1 Running 0 2d20h my-release-zookeeper-0 1/1 Running 0 2d20h signoz-migrate-846b558f6-s6bdg 0/1 CrashLoopBackOff 7 13m [ec2-user@ip-10-0-4-191 ~]$ k logs signoz-migrate-846b558f6-s6bdg -n platform my-release-clickhouse 9000 admin 27ff0399-0d3a-4bd8-919d-17c2181e6fb9 No TTL found, skipping TTL migration No data found in clickhouse [ec2-user@ip-10-0-4-191 ~]$
    Prashant Shahi

    Prashant Shahi

    2 months ago
    My guess is that you migration script was already ran once.. You can delete the pod.
    @Vishal Sharma if migration script was already ran, we should have exited with 0 status code
    Vishal Sharma

    Vishal Sharma

    2 months ago
    @Prashant Shahi I see that there was no data in exceptions table so data was not found.@Rahul Tiwari Do you use exceptions feature? https://signoz.io/docs/userguide/exceptions/#viewing-exceptions
    Rahul Tiwari

    Rahul Tiwari

    2 months ago
    i have attached the screen shot.
    Vishal Sharma

    Vishal Sharma

    2 months ago
    Then it’s fine, the migration script ran successfully as you are not using exceptions feature.
    Rahul Tiwari

    Rahul Tiwari

    2 months ago
    @Vishal Sharma and @Prashant Shahi the signoz-migrate pod is going into crashloopbackoff state, with below error.
    [ec2-user@ip-10-0-4-191 ~]$ k get pods -n platform NAME READY STATUS RESTARTS AGE chi-signoz-cluster-0-0-0 1/1 Running 0 2d21h clickhouse-operator-787f8989cd-kr52v 2/2 Running 0 2d21h my-release-signoz-alertmanager-0 1/1 Running 0 2d21h my-release-signoz-frontend-68b56fc4b8-zg6hl 1/1 Running 0 2d21h my-release-signoz-otel-collector-57d668b84c-zcbr5 1/1 Running 0 2d21h my-release-signoz-otel-collector-metrics-59556558b5-7gks2 1/1 Running 0 2d21h my-release-signoz-query-service-0 1/1 Running 0 2d21h my-release-zookeeper-0 1/1 Running 0 2d21h signoz-migrate-846b558f6-s6bdg 0/1 CrashLoopBackOff 16 61m [ec2-user@ip-10-0-4-191 ~]$ k logs signoz-migrate-846b558f6-s6bdg -n platform my-release-clickhouse 9000 admin 27ff0399-0d3a-4bd8-919d-17c2181e6fb9 No TTL found, skipping TTL migration No data found in clickhouse [ec2-user@ip-10-0-4-191 ~]$
    Vishal Sharma

    Vishal Sharma

    2 months ago
    @Rahul Tiwari You can delete migration pods with this command:
    kubectl -n platform delete pod signoz-migrate
    Rahul Tiwari

    Rahul Tiwari

    2 months ago
    @Vishal Sharma i tried deleting it but it is again giving the same error.
    [ec2-user@ip-10-0-4-191 ~]$ k get pods -n platform NAME READY STATUS RESTARTS AGE chi-signoz-cluster-0-0-0 1/1 Running 0 2d21h clickhouse-operator-787f8989cd-kr52v 2/2 Running 0 2d21h my-release-signoz-alertmanager-0 1/1 Running 0 2d21h my-release-signoz-frontend-68b56fc4b8-zg6hl 1/1 Running 0 2d21h my-release-signoz-otel-collector-57d668b84c-zcbr5 1/1 Running 0 2d21h my-release-signoz-otel-collector-metrics-59556558b5-7gks2 1/1 Running 0 2d21h my-release-signoz-query-service-0 1/1 Running 0 2d21h my-release-zookeeper-0 1/1 Running 0 2d21h signoz-migrate-846b558f6-p6rtb 0/1 CrashLoopBackOff 3 81s [ec2-user@ip-10-0-4-191 ~]$ k logs signoz-migrate-846b558f6-p6rtb -n platform my-release-clickhouse 9000 admin 27ff0399-0d3a-4bd8-919d-17c2181e6fb9 No TTL found, skipping TTL migration No data found in clickhouse [ec2-user@ip-10-0-4-191 ~]$
    @Vishal Sharma and @Prashant Shahi i have completely uninstall signoz ver.9.1 and install 10.0. Thank you for your support
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    @Ankit Nayan , in the PromQL we have given this query "histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="POST /api/task/complete"}[1m])) by (le)) > 50". When saving we got error "at least one metric condition is required". Previously same query used to work. Could you please help to solve this issue ? Also we tried to create a query using query builder but how this function "histogram_quantile" can be added to that query in the query builder?
    Amol Umbark

    Amol Umbark

    2 months ago
    @Anil Kumar Bandrapalli when saving the rule you need to keep the promql tab active. on saving you would also notice a message which says the query will be saved with promql expression instead of query builder . can you please do this
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    yes i did the same thing but still getting same error
    Amol Umbark

    Amol Umbark

    2 months ago
    I will try to reproduce this but meanwhile can you please create a new alert rule and proceed.
    the issue could be result of switching from promql to query builder
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    sure
    Amol Umbark

    Amol Umbark

    2 months ago
    also try to input just the metric query in the promql expression so the graph can be plotted. once your graph looks good the add the threshold in the second step
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    for the fresh alert showing error "metric name is missing in A"
    but i am in PromQL tab only
    Amol Umbark

    Amol Umbark

    2 months ago
    that's unexpected. let me review and get back
    but are you able to plot the graph for promql query
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    nope.
    we cant able to save it na
    with query builder we are able to see the graph
    Amol Umbark

    Amol Umbark

    2 months ago
    to see graph there is no need to save
    can you pls share a screenshot of your alert
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    sorry the graph is showing
    Amol Umbark

    Amol Umbark

    2 months ago
    ok great let me get back on the save issue
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    But could not be able to save that alert
    Amol Umbark

    Amol Umbark

    2 months ago
    got it
    can you try selecting a metric name in query builder but keep the promql tab active right before you save
    select a random metric ..should not matter
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    ok
    i am able to save the alert
    Amol Umbark

    Amol Umbark

    2 months ago
    cool i will resolve the issue of metric name error
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    OK
    Ankit Nayan

    Ankit Nayan

    2 months ago
    @Anil Kumar Bandrapalli
    histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="workflow-service", operation="POST /api/task/complete"}[1m])) by (le))
    try changing the
    [1m]
    to
    [5m]
    . Does the chart plot now?
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    @Ankit Nayan it is working fine now. we are able to receive alerts . I have one more question. Do we have integration with camunda platform ?
    Ankit Nayan

    Ankit Nayan

    2 months ago
    never heard of camunda...what do you want to do by the integration, I am curious!
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    we would like to integrate signoz into camunda platform to see the metrics and set the alerts
    Ankit Nayan

    Ankit Nayan

    2 months ago
    does camunda support webhook receiver..you can use webhook channel at signoz to send any alert to any webhook integration platform like zapier
    Anil Kumar Bandrapalli

    Anil Kumar Bandrapalli

    2 months ago
    you can get more info from this link https://camunda.com/ we are actually working on workflows, like in process flow in jira
    apart from alerts can we integrate this and get the metrics like how we are able to see p99,top endpoints like that
    Hi @Ankit Nayan, we are able to integrate signoz with tomcat java application which is using mysql as DB. Now we are able to see the DB calls and traces as well. But we are seeing question mark (?) in the db.statement . can we can get exact value what is being passed to that query ?
    Ankit Nayan

    Ankit Nayan

    2 months ago
    I am afraid, I have not seen anybody using like that. @Srikanth Chekuri do you have any idea if this can be enabled soemwhere?
    Srikanth Chekuri

    Srikanth Chekuri

    2 months ago
    @Anil Kumar Bandrapalli The question marks will remain in the statement but there should be a optional flag to capture the params but the java instrumentation doesn't support it yet https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/400.