Coming from self hosted Signoz and surprised after moving on SigNoz Community #signoz-cloud

Coming from self-hosted Signoz and surprised after...

Hien Le

06/19/2025, 8:05 PM

Coming from self-hosted Signoz and surprised after moving one environment to a Signoz Cloud trial account how much Metrics cost I'm wasting. Support has advised looking at Drop Metrics, and that seems applicable to some k8s Node Agent events (

k8s_replicaset_available/desired

) that we don't care about but my heaviest metric is

http-client/server_duration_bucket

coming from NodeJS AutoInstrumentation. There's been 3M samples over the last 24 hours but much of that is off-hours where our application is doing very little. Is this a case of generating samples very frequently even though the value is usually 0, if so how could we reduce the samples? It doesn't seem a Batch Processor in the Metrics Pipeline would help.

Srikanth Chekuri

06/20/2025, 7:26 AM

Is this a case of generating samples very frequently even though the value is usually 0

Yes

if so how could we reduce the samples?

Please change the temporality to

delta

by setting the env

OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE=delta

Hien Le

06/20/2025, 4:34 PM

Thank you! So I'd set that on Instrumentation so it applies to injected pods? Would the same env-var need to be set in signoz-k8s-infra chart to reduce the node agent metrics volume?

Srikanth Chekuri

06/21/2025, 1:31 AM

So I'd set that on Instrumentation so it applies to injected pods?

It should be part of application env vars.

Would the same env-var need to be set in signoz-k8s-infra chart to reduce the node agent metrics volume?

No, the agent default metrics are k8s resource metrics. Unlike the application metrics, the agent metrics won't see any reduction with this change.

Hien Le

06/23/2025, 4:52 PM

Since the original screenshot, I made a few changes at the end of last week: 1. added a 2nd environment to Signoz (should've doubled the volume) 2. added temporality=delta 3. increased agent collection-interval from 30s to 60s (should've halved the volume) The k8s metrics held stable which makes sense given 1 and 3. The

http_server_duration_buckets

is still ~1.6M. I'm guessing this could be due to regular kubeprobes sending metrics? The

http_client_duration_buckets

went from 3M to 1.5M even with an extra environment, so it seems temporarily=delta is helping but it is not drastic? It it safe to filter those two metrics? they seem redundant with traces.

Hien Le

06/23/2025, 4:54 PM

When trying to determine which metrics to drop to reduce costs, should I sort by Samples or Time Series in the Metrics Explorer?

Srikanth Chekuri

06/24/2025, 12:53 AM

When trying to determine which metrics to drop to reduce costs, should I sort by Samples or Time Series in the Metrics Explorer?

You should sort by samples.

The
http_client_duration_buckets
went from 3M to 1.5M even with an extra environment, so it seems temporarily=delta is helping but it is not drastic?

The change would completely cut down the samples produced during off-hours. However, the samples during the regular hours would be relatively same if the attributes appear recurring (than say they visit app once and don't come again). So the gains you have are from the off-hours in that case.

Hien Le

07/03/2025, 1:00 AM

Our cluster is running an older (0.11.4) signoz-k8s-infra chart, and consuming lots of Cloud ingest daily budget, e.g. 1.4M+

system.disk.operations

samples, 1.4M+

k8s.replicaset.desired

samples. Do we need a newer chart like 0.13.0 to select metrics? Should the names be pulled from the Cloud Metrics dashboard? the documented conventions look different.

Srikanth Chekuri

07/03/2025, 2:41 AM

Updating the chart version won't change the number of samples created for the metrics

system.disk.operations

k8s.replicaset.desired

because the come from

hostmetricsreceiver

and

kubeletstatsreceiver

neither of which have a bug such as producing duplicate samples.

Hien Le

07/03/2025, 4:47 PM

Ah, but the newer chart will allow me to disable those metrics completely by name if I'm not interested in them?

Hien Le

07/04/2025, 12:22 AM

Upgraded to the

0.13.0

chart and used these values to disable: 1.

kubeletstatsreceiver

metrics that are mostly unchanging for me 2.

hostmetricsreceiver

completely since many Node metrics seem available in the pod / container metrics but I'm not seeing a drop in Samples for the

system.disk*

system.cpu.*

metrics over the last 15 minutes. I'd also increased

collectionInterval

from

60s

4m

and that still didn't seem to reduce samples. The Flux HelmRelease fragment is attached along with the resulting ConfigMaps (from

kubectl describe

) and they appear correct (no

hostmetricsreceiver

, and DaemonSet

signoz-k8s-infra-otel-agent

and Deployment

signoz-k8s-infra-otel-deployment

have been restarted to ensure the pods are created with the latest version of the maps. Any other debugging suggestions?

signoz-k8s-infra.values.yaml configmap.signoz-k8s-infra-otel-deployment.yaml

configmap.signoz-k8s-infra-otel-agent.yaml

Hien Le

07/04/2025, 12:27 AM

Our dev environment is spending $3.5/day on metrics so any simple reduction will work. Would going back to defaults but with

collectionInterval: 5m

be okay?

Hien Le

07/07/2025, 7:20 PM

Used Metrics - Explorer to plot

<http://system.disk.io|system.disk.io>

with

SUM BY k8s.pod.name

to discover what I thought was

hostMetrics

were being sent by Python containers. Had to edit `instrumentation.yaml`:

Copy code

python:
    env:
      - name: OTEL_PYTHON_DISABLED_INSTRUMENTATIONS
        value: system_metrics

Hien Le

07/07/2025, 7:24 PM

Still unable to determinewhy this filter isn't working on the sidecars:

Copy code

processors:
      filter/drop_http_duration_buckets:
        metrics:
          exclude:
            match_type: strict
            metric_names:
              - http.server.duration.bucket
              - http.client.duration.bucket
    service:
      pipelines:
        metrics:
          receivers: [otlp]
          processors: [filter/drop_http_duration_buckets, attributes/upsert, resource/upsert, batch]
          exporters: [debug, otlp/local, otlp/cloud]

Any chance this is related to the underscore / period name normalization?

Hien Le

07/07/2025, 7:48 PM

Found an explanation that buckets are generated from

http.server.request.duration

and require a transformer to drop. This is the biggest consumer of my ingestion costs. Will dropping

http.server.request.duration

break the Signoz Services view?

Vibhu Pandey

07/07/2025, 7:52 PM

No it shouldn't break the services view.

Hien Le

07/07/2025, 7:55 PM

Thanks, the Services view's latency is based on Traces then?

Vibhu Pandey

07/07/2025, 8:00 PM

Thats correct

gratitude thank you 1

7 Views

Open in Slack

Previous Next