Hello what are the possible causes of having so many parts i SigNoz Community #support

Join Slack

Hello, what are the possible causes of having so m...

# support

Matheus Henrique

06/18/2025, 9:14 PM

Hello, what are the possible causes of having so many parts in the time_series_v4_1week table?

Srikanth Chekuri

06/18/2025, 9:15 PM

look at the logs of clickhouse. do you see any errors?

Matheus Henrique

06/18/2025, 9:47 PM

{"date_time":"1750283204.747538","thread_name":"TCPServerConnection ([#33])","thread_id":"868","level":"Error","query_id":"1866fcd2-5e75-4803-b5a5-6a648a62564c","logger_name":"TCPHandler","message":"Code: 252. DB:Exception Too many parts (3501 with average size of 140.25 KiB) in table 'signoz_metrics.time_series_v4_1week (2f866987-d1a9-4cf0-913b-da5e18e20eb3)'. Merges are processing significantly slower than inserts:

Srikanth Chekuri

06/19/2025, 2:04 AM

the time_series_v4_1week can't have more parts than the v4 table. is there any addl context what happened here?

Matheus Henrique

06/19/2025, 2:36 AM

I don't think so. The ingestion of metrics is affected, but I don't see any damage to other tables. I have k8s-infra for the SigNoz cluster and another one, some other application instrumentation, secondary storage with s3, but nothing out of the ordinary. I don't understand why this table has this increase in parts, I appreciate any help.

Srikanth Chekuri

06/19/2025, 2:38 AM

Did something change around 11th or 12th that cause the partition to have so many parts?

Matheus Henrique

06/19/2025, 2:44 AM

No new instrumentation this month, I've been seeing the error message for a while now. But now the gap in the metrics has become a problem, is there anything I can do to fix or avoid so many parts in this table?

Srikanth Chekuri

06/19/2025, 2:45 AM

What is you collector config?

Srikanth Chekuri

06/19/2025, 2:45 AM

Please share your collector config

Matheus Henrique

06/19/2025, 2:57 PM

Copy code

config:
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
            max_recv_msg_size_mib: 16
          http:
            endpoint: 0.0.0.0:4318
      jaeger:
        protocols:
          grpc:
            endpoint: 0.0.0.0:14250
          thrift_http:
            endpoint: 0.0.0.0:14268
            # Uncomment to enable thift_company receiver.
            # You will also have set set enable it in `otelCollector.ports
            # thrift_compact:
            #   endpoint: 0.0.0.0:6831
      httplogreceiver/heroku:
        # endpoint specifies the network interface and port which will receive data
        endpoint: 0.0.0.0:8081
        source: heroku
      httplogreceiver/json:
        # endpoint specifies the network interface and port which will receive data
        endpoint: 0.0.0.0:8082
        source: json
    processors:
      # Batch processor config.
      # ref: <https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/batchprocessor/README.md>
      memory_limiter:
        check_interval: 1s
        limit_mib: 9500
        spike_limit_mib: 2000
      batch:
        send_batch_size: 200000
        timeout: 30s
        send_batch_max_size: 250000
      # Memory Limiter processor.
      # If not set, will be overridden with values based on k8s resource limits.
      # ref: <https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/memorylimiterprocessor/README.md>
      # memory_limiter: null
      signozspanmetrics/delta:
        metrics_exporter: clickhousemetricswrite, signozclickhousemetrics
        latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s]
        dimensions_cache_size: 100000
        dimensions:
          - name: service.namespace
            default: default
          - name: deployment.environment
            default: default
          - name: signoz.collector.id
        aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA
    extensions:
      health_check:
        endpoint: 0.0.0.0:13133
      zpages:
        endpoint: localhost:55679
      pprof:
        endpoint: localhost:1777
    exporters:
      clickhousetraces:
        datasource: tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/${env:CLICKHOUSE_TRACE_DATABASE}
        low_cardinal_exception_grouping: ${env:LOW_CARDINAL_EXCEPTION_GROUPING}
        use_new_schema: true
        timeout: 30s
      clickhousemetricswrite:
        endpoint: tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/${env:CLICKHOUSE_DATABASE}
        timeout: 30s
        resource_to_telemetry_conversion:
          enabled: true
        disable_v2: true
      signozclickhousemetrics:
        dsn: tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/${env:CLICKHOUSE_DATABASE}
        timeout: 50s
      clickhouselogsexporter:
        dsn: tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/${env:CLICKHOUSE_LOG_DATABASE}
        timeout: 10s
        use_new_schema: true
      metadataexporter:
        dsn: tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/signoz_metadata
        timeout: 10s
        tenant_id: ${env:TENANT_ID}
        cache:
          provider: in_memory
    service:
      telemetry:
        logs:
          encoding: json
        metrics:
          address: 0.0.0.0:8888
      extensions: [health_check, zpages, pprof]
      pipelines:
        traces:
          receivers: [otlp, jaeger]
          processors: [signozspanmetrics/delta, batch, memory_limiter]
          exporters: [clickhousetraces, metadataexporter]
        metrics:
          receivers: [otlp]
          processors: [batch, memory_limiter]
          exporters: [clickhousemetricswrite, metadataexporter, signozclickhousemetrics]
        logs:
          receivers: [otlp, httplogreceiver/heroku, httplogreceiver/json]
          processors: [batch, memory_limiter]
          exporters: [clickhouselogsexporter, metadataexporter]

Matheus Henrique

06/23/2025, 12:57 PM

@Srikanth Chekuri, any suggestions?

Srikanth Chekuri

06/24/2025, 12:54 AM

Hi @Matheus Henrique, sorry, missed the last message. The config looks fine to me. Has this been the config you have been following or did it get any updates before 12th?

Matheus Henrique

06/24/2025, 1:25 PM

Hello, this is the current configuration.

Matheus Henrique

06/26/2025, 8:03 PM

Hello, I identified that k8s-infra was the main cause of so many writes in Clickhouse. For now I will not use it, I believe that the default configuration of the OpenTelemetry agent should be changed in k8s-infra as needed and also for other external agents with host metrics and others.

Srikanth Chekuri

06/27/2025, 2:54 AM

Hi @Matheus Henrique, the k8s-infra agent defaults are fine. Many users are using without any issue. The parts are created based on the config on the main SigNoz installation. For each insert, there would be one ClickHouse part created. These parts get merged in the background to create bigger and smaller parts. If the number of parts is too high, it means number of writes is outpacing the background merge pace. From the main SigNoz installation config it looks like fine but it's not clear what happened on 12th Jun that created so many parts. I don't think there is anything to change in k8s-infra

Matheus Henrique

06/27/2025, 1:13 PM

Thank you @Srikanth Chekuri. Currently the parts are like this, without using k8s-infra. What I have observed is that when I put it back in, the growth of the parts is accelerated. Any suggestions on where else I can investigate?

Srikanth Chekuri

06/28/2025, 9:39 AM

Can you share what resources are give for ClickHouse?

Matheus Henrique

06/30/2025, 2:01 PM

From values.yaml

Copy code

resources:
    requests:
      cpu: 3000m
      memory: 4Gi
    limits:
      cpu: 8000m
      memory: 20Gi

Srikanth Chekuri

06/30/2025, 2:02 PM

This looks decent 1. can you share scale of you metrics collection. Like how many samples and time series are getting collected (you can use metrics explorer) 2. What is the resource usage on average and max resource usage on the node where ClickHouse runs

31 Views

Open in Slack

Previous Next