Tyler Wells
06/12/2024, 1:26 PMsignoz-otel-collector
keeps restarting with OOMKilled - exit code: 137. There’s only ~175k spans, and 17k metrics but it’s using a ton of memory and then crashing
I see this in the logs.
{"level":"info","timestamp":"2024-06-12T13:23:22.493Z","caller":"signozcol/collector.go:121","msg":"Collector service is running"}
{"level":"info","timestamp":"2024-06-12T13:23:22.493Z","logger":"agent-config-manager","caller":"opamp/config_manager.go:168","msg":"Config has not changed"}
{"level":"info","timestamp":"2024-06-12T13:23:23.279Z","caller":"service/service.go:73","msg":"Client started successfully"}
{"level":"info","timestamp":"2024-06-12T13:23:23.279Z","caller":"opamp/client.go:49","msg":"Ensuring collector is running","component":"opamp-server-client"}
2024-06-12T13:24:22.389Z warn clickhousemetricsexporter/exporter.go:272 Dropped cumulative histogram metric {"kind": "exporter", "data_type": "metrics", "name": "clickhousemetricswrite", "name": "signoz_latency"}
2024-06-12T13:24:22.484Z warn clickhousemetricsexporter/exporter.go:279 Dropped exponential histogram metric with no data points {"kind": "exporter", "data_type": "metrics", "name": "clickhousemetricswrite", "name": "signoz_latency"}
2024-06-12T13:25:18.135Z info exporterhelper/retry_sender.go:177 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "error": "StatementSend:context deadline exceeded", "interval": "5.882953348s"}
2024-06-12T13:25:24.996Z info exporterhelper/retry_sender.go:177 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "error": "StatementSend:context deadline exceeded", "interval": "7.161709269s"}
2024-06-12T13:25:26.504Z info exporterhelper/retry_sender.go:177 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "error": "StatementSend:context deadline exceeded", "interval": "6.523426302s"}
2024-06-12T13:25:26.536Z info exporterhelper/retry_sender.go:177 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "error": "StatementSend:context deadline exceeded", "interval": "4.419607822s"}
2024-06-12T13:25:26.753Z info exporterhelper/retry_sender.go:177 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "error": "StatementSend:context deadline exceeded", "interval": "6.233919422s"}
2024-06-12T13:25:26.763Z info exporterhelper/retry_sender.go:177 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "error": "StatementSend:context deadline exceeded", "interval": "2.67037973s"}
2024-06-12T13:25:26.769Z info exporterhelper/retry_sender.go:177 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "error": "StatementSend:context deadline exceeded", "interval": "5.126252319s"}
2024-06-12T13:25:26.958Z info exporterhelper/retry_sender.go:177 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "error": "StatementSend:context deadline exceeded", "interval": "4.857335267s"}
2024-06-12T13:25:28.494Z info exporterhelper/retry_sender.go:177 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "error": "StatementSend:context deadline exceeded", "interval": "4.344819049s"}
any help would be much appreciated.Tyler Wells
06/12/2024, 1:28 PM<http://docker.io/signoz/signoz-otel-collector:0.88.21|docker.io/signoz/signoz-otel-collector:0.88.21>
Srikanth Chekuri
06/12/2024, 1:50 PMTyler Wells
06/12/2024, 2:17 PMotelCollector:
service:
type: NodePort
nodeSelector:
<http://kubernetes.io/arch|kubernetes.io/arch>: amd64
resources:
requests:
cpu: 100m
memory: 2Gi
limits:
cpu: "1"
memory: 4Gi
ports:
jaeger-thrift:
enabled: false
jaeger-grpc:
enabled: false
logsheroku:
enabled: false
Tyler Wells
06/12/2024, 2:18 PMvalues.yaml
Tyler Wells
06/12/2024, 2:18 PMv0.44.0
Tyler Wells
06/12/2024, 2:21 PMTyler Wells
06/12/2024, 2:23 PMotel-collector-config.yaml
I can provideSrikanth Chekuri
06/12/2024, 2:23 PMTyler Wells
06/12/2024, 2:24 PMexporters:
clickhouselogsexporter:
dsn: tcp://${CLICKHOUSE_USER}:${CLICKHOUSE_PASSWORD}@${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/${CLICKHOUSE_LOG_DATABASE}
timeout: 10s
clickhousemetricswrite:
endpoint: tcp://${CLICKHOUSE_USER}:${CLICKHOUSE_PASSWORD}@${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/${CLICKHOUSE_DATABASE}
resource_to_telemetry_conversion:
enabled: true
timeout: 15s
clickhousetraces:
datasource: tcp://${CLICKHOUSE_USER}:${CLICKHOUSE_PASSWORD}@${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/${CLICKHOUSE_TRACE_DATABASE}
low_cardinal_exception_grouping: ${LOW_CARDINAL_EXCEPTION_GROUPING}
prometheus:
endpoint: 0.0.0.0:8889
extensions:
health_check:
endpoint: 0.0.0.0:13133
pprof:
endpoint: localhost:1777
zpages:
endpoint: localhost:55679
processors:
batch:
send_batch_size: 50000
timeout: 1s
k8sattributes:
extract:
metadata:
- k8s.namespace.name
- k8s.pod.name
- k8s.pod.uid
- k8s.pod.start_time
- k8s.deployment.name
- k8s.node.name
filter:
node_from_env_var: K8S_NODE_NAME
passthrough: false
pod_association:
- sources:
- from: resource_attribute
name: k8s.pod.ip
- sources:
- from: resource_attribute
name: k8s.pod.uid
- sources:
- from: connection
memory_limiter: null
resourcedetection:
detectors:
- env
- system
system:
hostname_sources:
- dns
- os
timeout: 2s
signozspanmetrics/cumulative:
dimensions:
- default: default
name: service.namespace
- default: default
name: deployment.environment
- name: signoz.collector.id
dimensions_cache_size: 100000
latency_histogram_buckets:
- 100us
- 1ms
- 2ms
- 6ms
- 10ms
- 50ms
- 100ms
- 250ms
- 500ms
- 1000ms
- 1400ms
- 2000ms
- 5s
- 10s
- 20s
- 40s
- 60s
metrics_exporter: clickhousemetricswrite
signozspanmetrics/delta:
aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA
dimensions:
- default: default
name: service.namespace
- default: default
name: deployment.environment
- name: signoz.collector.id
dimensions_cache_size: 100000
latency_histogram_buckets:
- 100us
- 1ms
- 2ms
- 6ms
- 10ms
- 50ms
- 100ms
- 250ms
- 500ms
- 1000ms
- 1400ms
- 2000ms
- 5s
- 10s
- 20s
- 40s
- 60s
metrics_exporter: clickhousemetricswrite
receivers:
hostmetrics:
collection_interval: 30s
scrapers:
cpu: {}
disk: {}
filesystem: {}
load: {}
memory: {}
network: {}
httplogreceiver/heroku:
endpoint: 0.0.0.0:8081
source: heroku
httplogreceiver/json:
endpoint: 0.0.0.0:8082
source: json
jaeger:
protocols:
grpc:
endpoint: 0.0.0.0:14250
thrift_http:
endpoint: 0.0.0.0:14268
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
max_recv_msg_size_mib: 16
http:
endpoint: 0.0.0.0:4318
otlp/spanmetrics:
protocols:
grpc:
endpoint: localhost:12345
service:
extensions:
- health_check
- zpages
- pprof
pipelines:
logs:
exporters:
- clickhouselogsexporter
processors:
- batch
receivers:
- otlp
- httplogreceiver/heroku
- httplogreceiver/json
metrics:
exporters:
- clickhousemetricswrite
processors:
- batch
receivers:
- otlp
metrics/internal:
exporters:
- clickhousemetricswrite
processors:
- resourcedetection
- k8sattributes
- batch
receivers:
- hostmetrics
traces:
exporters:
- clickhousetraces
processors:
- signozspanmetrics/cumulative
- signozspanmetrics/delta
- batch
receivers:
- otlp
- jaeger
telemetry:
metrics:
address: 0.0.0.0:8888
Srikanth Chekuri
06/12/2024, 2:28 PMotelCollector:
config:
exporters:
clickhousetraces:
timeout: 15s
There’s only ~175k spans, and 17k metricsWhere are you getting these numbers from?
Tyler Wells
06/12/2024, 2:28 PMTyler Wells
06/12/2024, 2:30 PMselect count(*) from signoz_traces.distributed_signoz_spans;
. I can also see the spans being reports in the UITyler Wells
06/12/2024, 2:30 PMTyler Wells
06/12/2024, 2:39 PMTyler Wells
06/12/2024, 2:48 PMSigNoz is an open-source APM. It helps developers monitor their applications & troubleshoot problems, an open-source alternative to DataDog, NewRelic, etc.
Powered by