Issue: Dropped logs ```❯ kl -n platform pod/apm-s...
# support
d
Issue: Dropped logs
Copy code
❯ kl -n platform pod/apm-signoz-otel-collector-75cb5fccb6-wnpql

{"level":"warn","ts":"2025-09-10T08:10:13.215Z","caller":"signozspanmetricsprocessor/processor.go:1187","msg":"Too many operations to track, using overflow operation name","resource":{"service.instance.id":"c0553d2a-1063-47f1-9841-733f3254bb97","service.name":"/signoz-otel-collector","service.version":"dev"},"otelcol.component.id":"signozspanmetrics/delta","otelcol.component.kind":"processor","otelcol.pipeline.id":"traces","otelcol.signal":"traces","maxNumberOfOperationsToTrackPerService":2048,"serviceName":"frontend-nudge"}
{"level":"warn","ts":"2025-09-10T08:10:13.216Z","caller":"signozspanmetricsprocessor/processor.go:1187","msg":"Too many operations to track, using overflow operation name","resource":{"service.instance.id":"c0553d2a-1063-47f1-9841-733f3254bb97","service.name":"/signoz-otel-collector","service.version":"dev"},"otelcol.component.id":"signozspanmetrics/delta","otelcol.component.kind":"processor","otelcol.pipeline.id":"traces","otelcol.signal":"traces","maxNumberOfOperationsToTrackPerService":2048,"serviceName":"frontend-nudge"}
{"level":"warn","ts":"2025-09-10T08:10:13.216Z","caller":"signozspanmetricsprocessor/processor.go:1187","msg":"Too many operations to track, using overflow operation name","resource":{"service.instance.id":"c0553d2a-1063-47f1-9841-733f3254bb97","service.name":"/signoz-otel-collector","service.version":"dev"},"otelcol.component.id":"signozspanmetrics/delta","otelcol.component.kind":"processor","otelcol.pipeline.id":"traces","otelcol.signal":"traces","maxNumberOfOperationsToTrackPerService":2048,"serviceName":"frontend-nudge"}
{"level":"warn","ts":"2025-09-10T08:10:13.216Z","caller":"signozspanmetricsprocessor/processor.go:1187","msg":"Too many operations to track, using overflow operation name","resource":{"service.instance.id":"c0553d2a-1063-47f1-9841-733f3254bb97","service.name":"/signoz-otel-collector","service.version":"dev"},"otelcol.component.id":"signozspanmetrics/delta","otelcol.component.kind":"processor","otelcol.pipeline.id":"traces","otelcol.signal":"traces","maxNumberOfOperationsToTrackPerService":2048,"serviceName":"frontend-nudge"}
Too many operations to track, using overflow operation name. Found this issue here - https://community-chat.signoz.io/t/27155515/hello-i-ve-noticed-numerous-warnings-in-my-collector-logs-st But, I don't think we have too high cardinality as of now, we had already refactored that. Traces are coming in fine, but logs are not. Whenever we search logs by trace_id (there is a high probability that log is not there, even if trace is)
Copy code
❯ kl -n platform pod/apm-signoz-otel-collector-75cb5fccb6-cxwl8 | grep error
Defaulted container "collector" out of: collector, apm-signoz-otel-collector-migrate-init (init)
{"level":"error","ts":"2025-09-06T14:34:47.739Z","caller":"service@v0.128.0/service.go:189","msg":"error found during service initialization","resource":{"service.instance.id":"05e0b57f-d8f6-4905-a7b0-b7db0afa1ea9","service.name":"/signoz-otel-collector","service.version":"dev"},"error":"failed to build pipelines: failed to create \"clickhousetraces\" exporter for data type \"traces\": failed to create clickhouse client: dial tcp 172.20.229.184:9000: connect: connection refused","stacktrace":"<http://go.opentelemetry.io/collector/service.New.func1|go.opentelemetry.io/collector/service.New.func1>\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/service@v0.128.0/service.go:189\ngo.opentelemetry.io/collector/service.New\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/service@v0.128.0/service.go:220\ngo.opentelemetry.io/collector/otelcol.(*Collector).setupConfigurationComponents\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/otelcol@v0.128.0/collector.go:197\ngo.opentelemetry.io/collector/otelcol.(*Collector).Run\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/otelcol@v0.128.0/collector.go:312\ngithub.com/SigNoz/signoz-otel-collector/signozcol.(*WrappedCollector).Run.func1\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/signozcol/collector.go:103"}
{"level":"error","timestamp":"2025-09-06T14:34:47.940Z","caller":"opamp/server_client.go:269","msg":"failed to apply config","component":"opamp-server-client","error":"failed to reload config: /var/tmp/collector-config.yaml: collector failed to restart: failed to build pipelines: failed to create \"clickhousetraces\" exporter for data type \"traces\": failed to create clickhouse client: dial tcp 172.20.229.184:9000: connect: connection refused","stacktrace":"<http://github.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).onRemoteConfigHandler|github.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).onRemoteConfigHandler>\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/server_client.go:269\ngithub.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).onMessageFuncHandler\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/server_client.go:253\ngithub.com/open-telemetry/opamp-go/client/internal.(*receivedProcessor).ProcessReceivedMessage\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/internal/receivedprocessor.go:160\ngithub.com/open-telemetry/opamp-go/client/internal.(*wsReceiver).ReceiverLoop\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/internal/wsreceiver.go:94"}
{"level":"error","ts":"2025-09-06T14:34:47.947Z","caller":"service@v0.128.0/service.go:189","msg":"error found during service initialization","resource":{"service.instance.id":"6daf8a90-57c9-4d70-9f7b-1298a822c8ef","service.name":"/signoz-otel-collector","service.version":"dev"},"error":"failed to build pipelines: failed to create \"clickhouselogsexporter\" exporter for data type \"logs\": cannot configure clickhouse logs exporter: dial tcp 172.20.229.184:9000: connect: connection refused","stacktrace":"<http://go.opentelemetry.io/collector/service.New.func1|go.opentelemetry.io/collector/service.New.func1>\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/service@v0.128.0/service.go:189\ngo.opentelemetry.io/collector/service.New\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/service@v0.128.0/service.go:220\ngo.opentelemetry.io/collector/otelcol.(*Collector).setupConfigurationComponents\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/otelcol@v0.128.0/collector.go:197\ngo.opentelemetry.io/collector/otelcol.(*Collector).Run\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/otelcol@v0.128.0/collector.go:312\ngithub.com/SigNoz/signoz-otel-collector/signozcol.(*WrappedCollector).Run.func1\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/signozcol/collector.go:103"}
{"level":"error","timestamp":"2025-09-06T14:34:48.148Z","caller":"opamp/server_client.go:269","msg":"failed to apply config","component":"opamp-server-client","error":"failed to reload config: /var/tmp/collector-config.yaml: collector failed to restart: failed to build pipelines: failed to create \"clickhouselogsexporter\" exporter for data type \"logs\": cannot configure clickhouse logs exporter: dial tcp 172.20.229.184:9000: connect: connection refused","stacktrace":"<http://github.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).onRemoteConfigHandler|github.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).onRemoteConfigHandler>\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/server_client.go:269\ngithub.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).onMessageFuncHandler\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/server_client.go:253\ngithub.com/open-telemetry/opamp-go/client/internal.(*receivedProcessor).ProcessReceivedMessage\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/internal/receivedprocessor.go:160\ngithub.com/open-telemetry/opamp-go/client/internal.(*wsReceiver).ReceiverLoop\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/internal/wsreceiver.go:94"}
Another issue that I can see in logs of otel collector: failed to build pipelines: failed to create \"clickhousetraces\" exporter for data type \"traces\": failed to create clickhouse client: dial tcp 172.20.229.1849000 connect: connection refused We are running 2 pods, and both of them show 2 different issues
@Srikanth Chekuri possible to help here?
s
Hi @Dhruv garg, It looks like some network issue. When did you start seeing this?
d
Logs drop issue going on for sometime now. But I checked the logs today.
And what about too many operations one?
s
As it says, the service
frontend-nudge
has too many span names. Is that addressed?
d
Yeah, we removed that almost 2-3 weeks ago. can it still show same issue because of old data?
And if it is network issue, why is it happening with just 1 pod, instead of both the pods?
Networking issue is resolved just verified based on dates, that log is from 6th sept. My mistake.
But I am still not able to figure out why logs are getting dropped
@Srikanth Chekuri any suggestion for debugging?
s
Hi @Dhruv garg , so the collector logs don't have anything that says logs are getting dropped but they don't show up in UI?
d
yeah, some of the logs are not showing up, even though trace exists
@Srikanth Chekuri the memory usage has been too much at otel collector level, any way to reduce this?
s
How much memory usage do you notice?
d
Hey I was able to fix the issue. There were multiple issues when scale increased suddenly. Auto scaling was not enabled for otel collectors, application sending data was dropping data due to queues getting filled, and clickhouse pvc was running low on disk space as well. It's working fine now, the current resource usage for otel collector looks like this:
Copy code
apm-signoz-otel-collector-6b5bbcf774-58s6w         6m           61Mi            
apm-signoz-otel-collector-6b5bbcf774-5f9nh         280m         140Mi           
apm-signoz-otel-collector-6b5bbcf774-7t7l5         282m         174Mi           
apm-signoz-otel-collector-6b5bbcf774-d8c2h         71m          93Mi            
apm-signoz-otel-collector-6b5bbcf774-n56mv         210m         123Mi
s
Ok, good to know it's fixed now.
d
Copy code
otelCollector:
  replicaCount: 2
  resources:
    requests:
      cpu: 200m
      memory: 400Mi
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 10
    targetCPUUtilizationPercentage: 85
    targetMemoryUtilizationPercentage: 75
Copy code
config:
    processors:
      # ref: <https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/batchprocessor/README.md>
      batch:
        timeout: 500ms
        send_batch_size: 50000
        send_batch_max_size: 200000
Anything you will recommend to change in these settings?
s
So the number of collectors increase the number of writes to ClickHouse. Each additional write a some amount of work for ClickHouse because each insert creates a new part. You should try to vertically scale the collector first and then horizontall next. Your batch size looks fine to me, I would change it to 1s instead of 500ms. You can give bit more requests and reduce the max replicas.
d
Makes sense. thanks for the suggestion.