Hello, I'm encountering some problems with the co...
# support
m
Hello, I'm encountering some problems with the collector. This log is occurring all the time: {"level":"info","ts":1747055709.4789784,"caller":"internal/retry_sender.go:118","msg":"Exporting failed. Will retry the request after interval.","kind":"exporter","data_type":"metrics","name":"clickhousemetricswrite","error":"code: 252, message: Too many parts (3000 with average size of 37.03 KiB) in table 'signoz_metrics.time_series_v4_1week (2f866987-d1a9-4cf0-913b-da5e18e20eb3)'. signoz_metrics.time_series_v4_1week_mv_separate_attrs (6e0f9ac9-4dd8-46c2-beee-723d73c61f9f): while pushing to view signoz_metrics.time_series_v4_1day_mv_separate_attrs (104ae564-bc3c-4a90-ae25-6810e53ebd27): while pushing to view signoz_metrics.time_series_v4_6hrs_mv (d833f2eb-7b78-4010-a623-13ea2ec935a4)","interval":"7.497468219s"} I have already increased the batch value in the processor and also increased the collector and Clickhouse resources. Any help is welcome. Thanks.
n
Hey @Matheus Henrique • Could you please share the otel-collector config you're using • How are you running Signoz? • What does your volume of signals look like for ingestion
m
Hello, • I'm running SigNoz on a K8s cluster with ARM. There are 3 worker nodes with 10vCPU and 32GB of RAM. • Regarding the ingestion volume, I was able to obtain this from the metrics exposed by the collector:
Copy code
# HELP otelcol_exporter_sent_log_records Number of log record successfully sent to destination.
# TYPE otelcol_exporter_sent_log_records counter
otelcol_exporter_sent_log_records{exporter="clickhouselogsexporter",service_instance_id="6f1894a6-c010-482d-b842-2edde927c7e1",service_name="/signoz-otel-collector",service_version="dev"} 6.533264e+06
otelcol_exporter_sent_log_records{exporter="metadataexporter",service_instance_id="6f1894a6-c010-482d-b842-2edde927c7e1",service_name="/signoz-otel-collector",service_version="dev"} 6.533264e+06
# HELP otelcol_exporter_sent_metric_points Number of metric points successfully sent to destination.
# TYPE otelcol_exporter_sent_metric_points counter
otelcol_exporter_sent_metric_points{exporter="clickhousemetricswrite",service_instance_id="6f1894a6-c010-482d-b842-2edde927c7e1",service_name="/signoz-otel-collector",service_version="dev"} 806159
otelcol_exporter_sent_metric_points{exporter="metadataexporter",service_instance_id="6f1894a6-c010-482d-b842-2edde927c7e1",service_name="/signoz-otel-collector",service_version="dev"} 876189
otelcol_exporter_sent_metric_points{exporter="signozclickhousemetrics",service_instance_id="6f1894a6-c010-482d-b842-2edde927c7e1",service_name="/signoz-otel-collector",service_version="dev"} 1.027244e+06
# HELP otelcol_exporter_sent_spans Number of spans successfully sent to destination.
# TYPE otelcol_exporter_sent_spans counter
otelcol_exporter_sent_spans{exporter="clickhousetraces",service_instance_id="6f1894a6-c010-482d-b842-2edde927c7e1",service_name="/signoz-otel-collector",service_version="dev"} 1.8093586e+07
otelcol_exporter_sent_spans{exporter="metadataexporter",service_instance_id="6f1894a6-c010-482d-b842-2edde927c7e1",service_name="/signoz-otel-collector",service_version="dev"} 1.8093586e+07
• I added a second replica and the processor memory_limiter, but it still didn't work. Each collector can use up to 3500m for CPU and up to 10Gi of memory.
Copy code
config:
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
            max_recv_msg_size_mib: 16
          http:
            endpoint: 0.0.0.0:4318
      jaeger:
        protocols:
          grpc:
            endpoint: 0.0.0.0:14250
          thrift_http:
            endpoint: 0.0.0.0:14268
            # Uncomment to enable thift_company receiver.
            # You will also have set set enable it in `otelCollector.ports
            # thrift_compact:
            #   endpoint: 0.0.0.0:6831
      httplogreceiver/heroku:
        # endpoint specifies the network interface and port which will receive data
        endpoint: 0.0.0.0:8081
        source: heroku
      httplogreceiver/json:
        # endpoint specifies the network interface and port which will receive data
        endpoint: 0.0.0.0:8082
        source: json
    processors:
      # Batch processor config.
      # ref: <https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/batchprocessor/README.md>
      memory_limiter:
        check_interval: 1s
        limit_mib: 9500
        spike_limit_mib: 2000
      batch:
        send_batch_size: 250000
        timeout: 30s
        send_batch_max_size: 300000
      # Memory Limiter processor.
      # If not set, will be overridden with values based on k8s resource limits.
      # ref: <https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/memorylimiterprocessor/README.md>
      # memory_limiter: null
      signozspanmetrics/delta:
        metrics_exporter: clickhousemetricswrite, signozclickhousemetrics
        latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s]
        dimensions_cache_size: 100000
        dimensions:
          - name: service.namespace
            default: default
          - name: deployment.environment
            default: default
          - name: signoz.collector.id
        aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA
    extensions:
      health_check:
        endpoint: 0.0.0.0:13133
      zpages:
        endpoint: localhost:55679
      pprof:
        endpoint: localhost:1777
    exporters:
      clickhousetraces:
        datasource: tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/${env:CLICKHOUSE_TRACE_DATABASE}
        low_cardinal_exception_grouping: ${env:LOW_CARDINAL_EXCEPTION_GROUPING}
        use_new_schema: true
        timeout: 30s
      clickhousemetricswrite:
        endpoint: tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/${env:CLICKHOUSE_DATABASE}
        timeout: 30s
        resource_to_telemetry_conversion:
          enabled: true
        disable_v2: true
      signozclickhousemetrics:
        dsn: tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/${env:CLICKHOUSE_DATABASE}
        timeout: 50s
      clickhouselogsexporter:
        dsn: tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/${env:CLICKHOUSE_LOG_DATABASE}
        timeout: 10s
        use_new_schema: true
      metadataexporter:
        dsn: tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/signoz_metadata
        timeout: 10s
        tenant_id: ${env:TENANT_ID}
        cache:
          provider: in_memory
    service:
      telemetry:
        logs:
          encoding: json
        metrics:
          address: 0.0.0.0:8888
      extensions: [health_check, zpages, pprof]
      pipelines:
        traces:
          receivers: [otlp, jaeger]
          processors: [signozspanmetrics/delta, batch, memory_limiter]
          exporters: [clickhousetraces, metadataexporter]
        metrics:
          receivers: [otlp]
          processors: [batch, memory_limiter]
          exporters: [clickhousemetricswrite, metadataexporter, signozclickhousemetrics]
        logs:
          receivers: [otlp, httplogreceiver/heroku, httplogreceiver/json]
          processors: [batch, memory_limiter]
          exporters: [clickhouselogsexporter, metadataexporter]
n
Can you share the helm values? want to see how does your clickhouse config looks like
m
I just removed some information from S3 and the ingress hosts.