This message was deleted SigNoz Community #support

Join Slack

This message was deleted.

# support

Slackbot

02/17/2023, 6:21 AM

This message was deleted.

Srikanth Chekuri

02/17/2023, 6:32 AM

You can go to alerts, and in the query builder, pick the

HIST_QUANTILE_XX

with metrics name

signoz_latency_bucket

and choose

service_name

le

in the group by clause. It should show you the pXX latency on which you can set the threshold limit to alert on. Let us know if you need any additional help.

Arnab Dutta

02/17/2023, 8:08 AM

We are seeing a mismatch in the graph plotted by the metric, and what we see on the latency dashboard in overview metrics. The graphs are for the same timeframe of the same services. Can you tell what we are missing? Is the granularity different or is the metric showing something else and not the latency?

Prashant Shahi

02/17/2023, 8:17 AM

@Arnab Dutta you can compare the key operation dashboards in promql and query builder: https://github.com/SigNoz/dashboards/tree/main/key-operations

Prashant Shahi

02/17/2023, 8:17 AM

message has been deleted

Srikanth Chekuri

02/17/2023, 10:12 AM

@Prashant Shahi they were looking for service latency, not the key operations. @Arnab Dutta that looks like a reversed graph of other to me. Let me check. There shouldn’t much difference.

🆗 1

Srikanth Chekuri

02/17/2023, 12:41 PM

@Arnab Dutta, the difference here is that in the first chart, the SizNoz query fetches the latency based on the service entry spans because a trace can have multiple spans within the service we only want to look at the duration of the service entry span for the accurate. However, your alert builder, is based on all the spans in the service, which leads to this. Please select the top-level endpoints for the operation attribute in the where clause and let us know if you still notice the difference.

Arnab Dutta

02/17/2023, 12:43 PM

What do you mean by service entry spans? Can you please explain?

Srikanth Chekuri

02/17/2023, 12:47 PM

There can be many spans for the same service within a trace. For example

/order

may internally call a database, or external service or compute something which may all start span, but there is going to be a span which is the parent of all the spans within a service for the whole trace, which represents the actual duration for the whole request within service. We tried to explain it here https://signoz.io/docs/userguide/metrics/#open-the-services-section

In a distributed trace, a request goes through several entities performing various kinds of work. There is an entry point span for each service that took part in the trace journey. This can be thought of as a sub-root span for the service. This sub-root span can have many child spans which could be doing work in parallel or sequential or a combination of both. From an outside perspective this sub-root span work is an operation done by the service and how much time it took to complete this operation is the duration metric. For a web server, this is an API endpoint returning some data and request time is the duration metric. For a messaging consumer service, this is a consume trigger, and till it is done with the message received. For a mobile client application, this could be a button click to submit a form and the time taken to fulfill the request.

Arnab Dutta

02/17/2023, 12:49 PM

got it. Yeah, we actuallly want the latencies based on the parent span / entry level span. For that, we have to add those selectively in the where clause? There could be a lot of them

Srikanth Chekuri

02/17/2023, 12:51 PM

Yeah, I see the pain point here. OpenTelemetry also sends service level metrics such as request count and duration out of the box. Did you enable them? In that case, you don’t have to worry about the top-level spans.

Arnab Dutta

02/17/2023, 12:53 PM

let me check and get back

22 Views

Open in Slack

Previous Next