This message was deleted SigNoz Community #support

Join Slack

This message was deleted.

# support

Slackbot

01/24/2023, 1:36 PM

This message was deleted.

Srikanth Chekuri

01/24/2023, 1:43 PM

What you see on the atlas is the server’s perceived execution time. On the client side, it will be more because other things are involved, such as tls connect, DNS lookup etc. This is not just atlas but valid for any cloud service (AWS comes to my mind when people come and ask why it showing higher because the underlying operation is hidden from regular users). There is also little contribution from instrumentation because it has to trace the whole execution. When these things are added up, it rounds to the numbers seen in SigNoz.

also that particular query is not even present in slow queries

I am not which part you are referring to here, if it’s about the atlas I don’t know how it shows these slow queries.

Arnab Dutta

01/24/2023, 1:46 PM

Okay, so I was referring to the particular query which we found was slow according to the Signoz trace. That is not present in slow queries in atlas. I got your point, so multiple other things that are involved before and after the actual query that the atlas server executes is what is taking time.

Srikanth Chekuri

01/24/2023, 1:49 PM

Yes, I don’t know if Atlas does this, but AWS explicitly calls this out wherever possible to make it clear that sever metrics and client observed durations will vary a little.

Arnab Dutta

01/24/2023, 1:51 PM

Okay, then we might have to look into it. But it is not varying by a little. On the atlas server side the query is getting executed in milliseconds. But, we see a trace which is showing it took 8.68 seconds.

Srikanth Chekuri

01/24/2023, 1:55 PM

That can’t be the case where it takes mills on the server side, and the client takes 8 seconds unless the server time is just execution time but the client’s large amounts of data back. You can expect some additional overhead of DNS Query + TLS + Instrumentation, but anything such as it’s millis here, but 8X seconds is usually not the correct observation.

Arnab Dutta

01/24/2023, 5:41 PM

I am doubtful that DNS Query + TLS is taking place, as we have already established the connection when the app started. Now we are using that live connection to send queries to the atlas server. Also, the data is very small that we are retrieving, around 2 KB. We are querying using the primary key _id, so something is getting missed that can explain the delay. I think it would be a better question for the opentelemetry community, as we are using their package that creates the spans. So, we need to know exactly what is captured by the span. I have checked all the metrics in atlas server and nothing explains the delay.

Srikanth Chekuri

01/24/2023, 5:54 PM

The minimum you could do is add a log line with query elapsed time in the app and check those. Your own graph shows there are some queries touching the 5 secs. If you can’t provide a reproducible, the OTEL community can’t do anything (even though it’s probably an issue). When you claim that the instrumentation’s wrong, you need to provide at least a few simple steps to show. The connections don’t live forever; they get reestablished and reused constantly. Are you saying your startup connection will be there for the entire application process life? what’s the P99 latency observed earlier on these routes (before SigNoz)? There is a lot of guessing game happening so I would rather do some homework and assess the suff.

✅ 1

19 Views

Open in Slack

Previous Next