Hi, I am using Signoz in k8s, and we are able to s...
# support
a
Hi, I am using Signoz in k8s, and we are able to see traces fine. But a few of the traces are missing spans. Can you tell what could be the reason of spans missing?
s
How many services are you running? Are you seeing spans getting randomly or for a specific services?
a
We are running a lot of services actually. But for tracing only 4 services are there, and randomly some of the traces are missing spans. E.g. -
s
How many services are you running?
a
We have 4 namespaces in kubernetes. Each namespace is running mutiple replicas of around 5-10 services. So, in total the pod count would be around 200-300. But, for monitoring we are tracing a single service across the namespaces. For each namespace the service name is appended with the namespace. So, in signoz we are able to see 4 services. Each service is not a unique pod, but a combination of 5-10 replicas
So, the service name is Profile-Service appended with the namespace name
Each namespace is running multiple replicas of Profile-Service. Does this help?
s
This technically means only one service is sending data? How are you instrumenting your services?
a
correct. this is how we have started instumentation. So profile-service is our main API. Is there something wrong with this approach? other services are not currently supported by otel. They are running node v8. We are in the process of upgrading them, then we are planning to add instrumentations
s
node v8 is no longer supported. did you instrument all of the services or only one service?
a
only one service, which is on node v18
s
If you are instrumenting only one service, it’s odd some of the spans are missing from the same service. I was assuming you are also instrumenting the other services, which are propagating the context but not sending data.
a
no, that’s not happening. So, any other reason this could happen?
s
I am not sure what might be the case here. I don’t have anything off the top of my head.
a
Is there a limit to the number of spans a trace can have or is there a hard cap of the payload sizes?
One more thing I noticed is that some traces are missing. Is there a sampling of the requests?
s
Not by default. Is there way you can share simple reproducible example with this behaviour?
a
I cannot reproduce this. But we are printing the response times of user facing APIs, and for one call, the response time was quite high, i.e 35 secs. When I tried to find the trace for that API, it was missing. I sorted the traces for that API call in decreasing order of time, and the max time was 20 secs.
Could this be related to the exporter that we are using? We are currently using this package
@opentelemetry/exporter-trace-otlp-http
s
No, the exporter wouldn’t randomly drop the spans. If there was an issue, you wouldn’t be seeing any data.
a
Hmm.. ok, let me try to get more details
I checked the otel-collector logs and see this error a lot. I think this has something to do with missing spans. Can you please tell what this means?
Copy code
2023-02-18T08:56:35.303Z	warn	zapgrpc/zapgrpc.go:191	[transport] transport: http2Server.HandleStreams failed to read frame: read tcp 192.168.41.238:4317->192.168.35.69:55016: read: connection timed out	{"grpc_log": true}