How can I dig deeper into ingestion of metrics? I ...
# signoz-cloud
a
How can I dig deeper into ingestion of metrics? I can see we are reaching 1 million data points in a few hours (although we have only enabled 1 service for now). I'm not able to pinpoint what is causing so many metrics to be emitted. Is there any doc that I could use for this?
h
What's your tech stack? I went through this recently for our test env which is a mix of NodeJS and Python running on K8s that was generating 33M samples a day while idle. I started from Metrics per hour and looked at metric names:
Things that helped me: 1. Turning collectionInterval from
30s
to
2m
for all infra collectors (host, kubelet, etc). This brought their usage down to 25%. 2. A Python app was using otel-auto-instrumentation, which includes system_metrics so pods were reporting redundant "host metrics". Fixed by setting the pod env-var
OTEL_PYTHON_DISABLED_INSTRUMENTATIONS=system_metrics
to disable the library. 3. Changed the infra chart to disable a lot of metrics, since our pods use external DBs and didn't have meaningful local storage many of the filesystem ones weren't useful. Many of my services are low-volume, so
replicaset
desired/available metrics were a constant
1
and not useful. Ditto for pod_state and a bunch of other k8s metrics that aren't applicable to "simpler architectures". 4. Update the otel collector to drop a heavy weight HTTP metric bucket since trace spans already captured the same info. The Signoz team provided additional tuning help for me in this thread. I was able to get from 33M/day down to 2M/day with those changes.
🔥 3
a
@Chitransh Gupta this thread should be converted to docs for all cloud and community users
🙌 1
h
Also ended up dropping all logs containing
kube-probe
. Didn't care enough to sample them but that also reduced my idle cluster volume significantly. All this material is documented but I agree it'd be very useful to have a targeted guide during Setup. I can imagine cash-strapped startups might rule out the product entirely to find out monitoring one pod will cost them $6/mo.
🙌 1
c
Created an issue for this @Ankit Nayan - https://github.com/SigNoz/signoz-web/issues/1680 Thanks for a very detailed answer @Hien Le!
a
This is super useful @Hien Le. Yes, while it is detailed in the metrics-explorer page, it would be helpful to add a section on "analyzing metrics ingestion in detail". I'm on nodejs with ecs cluster with ~ 10 instances being monitored (to start with) that resulted in >10m metrics a day. I'm updating my collector interval and digging deeper into Metrics explorer now. Thank you!
h
CollectorInterval is the easiest upfront reduction. Due to my health / readiness interval my pods take about ~2minute to get ready so a 30s collection seems overkill. I've been meaning to write up more notes specifically for our workflow which was using the auto-instrument operator and signoz-k8s-infra. Yeah, everything is very well documented but scattered across various small articles that can be hard for folks to find their specific flow. They actually have an Ingestion Analysis Dashboard you can manually install as well.
If you take
2*60*60*24*30 = 5,184,000
samples/mo, and at least 12 metrics enabled by default per pod with
signoz-k8-sinfra
that's $6.22/mo for one pod so I can see how it seems to add up very quickly. I think there's actually more metrics enabled by default so definitely out-of-box Signoz might be a surprise for most folks.