Ritek Saxena

    Ritek Saxena

    2 months ago
    Hi Team, Currently I'm trying to setup alerts in signoz and as written in the documentation I have to setup expression to evaluate for the error. My concern is how would I know what are the different expression variables are available to me and is there any guide on how to write these expression to evaluate is it's an error or not ? I couldn't find it on signoz's official page. Thanks.
    Pranay

    Pranay

    2 months ago
    @Ritek Saxena We follow PromQL format for writing the expressions. you can learn more about it here - https://prometheus.io/docs/prometheus/latest/querying/basics/
    Ritek Saxena

    Ritek Saxena

    2 months ago
    Hi Thank you so much for this, although there's one more doubt. So in the documentation a variable named, "system_cpu_load_average" is used and there must be other predefined variables like this, It would be really helpful if I can get any such list.
    Pranay

    Pranay

    2 months ago
    Ritek Saxena

    Ritek Saxena

    2 months ago
    yes I have but the problem is I can only see traces not metrices for my service and I when I tried to send metrics as well by adding a receiver in config yaml file as mentioned in the docs, it didn't work.
    Hi Pranay, Below attached are two screenshots to explain the problem better, The first one is of how I setup an alert, The second one is of the Traces after hitting a request. As you may see, many of the spans are crossing 10ms latency mark and the alert has been set up for that but still I receive nothing. PS: I have successfully connected an alert channel and tested it as well. It will be really helpful if you could help me setup this alert thing.
    Pranay

    Pranay

    2 months ago
    @Amol Umbark @Ankit Nayan Do you have more insights on this?
    Ankit Nayan

    Ankit Nayan

    2 months ago
    @Ritek Saxena this is an incorrect promql to detect latency
    @Ritek Saxena you can use this to set alerts on
    percentile
    of latencies.
    histogram_quantile(0.99, sum(rate(signoz_latency_bucket{service_name="customer"}[1m])) by (le)) > 10
    Above alert would be fired if p99 of
    customer
    service is >10ms
    you can change it to below query for
    p50
    histogram_quantile(0.5, sum(rate(signoz_latency_bucket{service_name="customer"}[1m])) by (le)) > 10
    Ritek Saxena

    Ritek Saxena

    2 months ago
    Thank you so much for your reply Although the query I have used was given as default in the Signoz platform, I just changed the service name and reduced the value to 10.
    I'll try the commands you've just provided, though I was wondering if there's any document listing all the commands to setup an alert.
    Again Thank you very much for your response and time.
    I just tried the command, didn't work, the reason might be that I don't have metrics for my service I was just using traces. Although when I tried to setup metrics it didn't work as well. smh 😞
    Ankit Nayan

    Ankit Nayan

    2 months ago
    I'll try the commands you've just provided, though I was wondering if there's any document listing all the commands to setup an alert.
    @Pranay @Ashu we should include APM related alerts in docs
    might be that I don't have metrics for my service I was just using traces
    yeah..it won't work it the service does not appear under application list page
    @Ritek Saxena Which language and framework and signoz tutorial/blog did you use to set up auto-instrumentation? Probably the framework you use is not supported by otel.
    Ritek Saxena

    Ritek Saxena

    2 months ago
    @Ankit Nayan I was using manual instrumentation actually. Although I have tried using Auto instrumentation as well in that case service does appear but still the alerts are not firing up for some reason. And for the manual instrumentation case even the service doesn't appear.
    You can see in the below attached ss that the p99 latency is mostly beyond the 10ms mark but still the alert doesn't fire up.
    I am really liking using Signoz and this helping community as well. It's just this alert thing that has been a headache recently. I really appreciate the time and efforts you guys are putting to help others.
    UPDATE : when I continued to hit the API for several times the alert fired so mght be a chance that I just missed when it fired for just one API call, although I am still not getting any notification on my alert channel. What can be the issue ?
    Ankit Nayan

    Ankit Nayan

    2 months ago
    @Ritek Saxena there is an option to test your channel when you try to edit it. Does it work?
    Ritek Saxena

    Ritek Saxena

    2 months ago
    Yes the testing works fine.
    Ankit Nayan

    Ankit Nayan

    2 months ago
    I was using manual instrumentation actually.
    You must have missed adding the span kind
    server
    to manually created spans. Otherwise, it won't be calculated as a server and won't show up in the application list page
    Ritek Saxena

    Ritek Saxena

    2 months ago
    Right, I haven't set anything like this, I'll give it a try. Thanks 😄
    Ankit Nayan

    Ankit Nayan

    2 months ago
    Yes the testing works fine.
    If an alert shows firing here, it should be received at the channel too. @Amol Umbark any idea why this is happening?
    Ritek Saxena

    Ritek Saxena

    2 months ago
    Setting span kind as server worked well and now I can see the metrics for my manually instrumented application ..... Thank you so much @Ankit Nayan.
    Although notifications are still not there even though the alert is firing
    Ankit Nayan

    Ankit Nayan

    2 months ago
    Although notifications are still not there even though the alert is firing
    let us check back and confirm this
    BTW which version of signoz are you using?
    Ritek Saxena

    Ritek Saxena

    2 months ago
    Sure, the version I'm using is 0.8.80
    oops my bad It's 0.8.0
    Ankit Nayan

    Ankit Nayan

    2 months ago
    please upgrade to
    0.8.1
    , there was an issue with alerts delivery which got fixed at https://github.com/SigNoz/signoz/pull/1238
    Ritek Saxena

    Ritek Saxena

    2 months ago
    Ohh Alright I will upgrade and will let you know if the problem gets fixed. Thanks again.
    Heyy All, So after upgrading the version to 0.9.0 the alert notifications were working properly Thanks. But once the alert got resolved, it never sent a notification after that no matter how many times the latency crossed the threshold the alert never showed Firing status.
    Just got the notification once after that couldn't get any notification. Please Help. 😞
    Pranay

    Pranay

    2 months ago
    @Amol Umbark do you have more insights on this?
    Ritek Saxena

    Ritek Saxena

    2 months ago
    Hi Team, I am really stuck trying to fix these issues I am having with the alert feature. Please help me I have to deploy the code asap and it's taking a loooong time.😭 I am attaching screenshots for reference as well, The first one shows the alert configurations and the second one shows p99 latency which even crosses 500 ms mark but still the alert doesn't fire up.
    Ankit Nayan

    Ankit Nayan

    2 months ago
    @Ritek Saxena change
    1m
    to
    2m
    or
    5m
    in the alert. It means the alert rule will be evaluated for past 2m or 5m of data.
    also try to plot the query at promql section without the threshold and see if you see the chart for verification of alert rule.
    Ritek Saxena

    Ritek Saxena

    2 months ago
    Hi, I tried changing the time to other values and it worked fine. Thank you so much for your reply. Although it would be great if there was a documentation on these queries. Thanks again.
    Ankit Nayan

    Ankit Nayan

    2 months ago
    Although it would be great if there was a documentation on these queries.
    @Ashu
    Ritek Saxena

    Ritek Saxena

    2 months ago
    Hi team, I was really happy with alerts and all working good until now, I have a demonstration at 12:00 IST and the alerts stopped firing ( yet again) I have tried to plot the promQL query but it doesn't give me anything. 😭 Please Help I don't have much and have to give an end to end demo.
    Ankit Nayan

    Ankit Nayan

    2 months ago
    @Ritek Saxena did you use migrations script to upgrade to v0.9.0? I would suggest to upgrade to
    v0.9.2
    along with applying migrations
    Ashu

    Ashu

    2 months ago
    @Priyansh Have forwarded to Andrei to see how it can be documented.