Title Unused Materialized Views Causing ClickHouse Performa SigNoz Community #support

Title: Unused Materialized Views Causing ClickHous...

Parth Ingole

09/18/2025, 5:51 PM

Title: Unused Materialized Views Causing ClickHouse Performance Issues - Can We Remove Them? # Issue We're experiencing severe ClickHouse merge delays and discovered unused materialized views in the schema that may be contributing to the problem. # Found in Code -

time_series_v4_6hrs_mv_separate_attrs

time_series_v4_1day_mv_separate_attrs

time_series_v4_1week_mv_separate_attrs

These are defined in

metrics_migrations.go

but not used by any collector components. # Questions 1. Are these materialized views still needed by SigNoz frontend/query service? 2. Are they removed from latest migration binary for performance reasons? 3. Can we safely drop them to improve merge performance? 4. Should they be removed from schema migrations too? # Impact Each insert cascades through multiple materialized views, multiplying merge operations and causing queue buildup -------------------------------------------------------------------------------------- Title: Issue we’re observing is that queries arriving at the same time are not being batched into the same async insert buffer. Here’s a sample from `system.asynchronous_insert_log`: • Two inserts into the same table at the same second (

14:43:52

) • Both queries are identical (

INSERT INTO signoz_metrics.distributed_samples_v4 … FORMAT Native

)

Copy code

Row 1:
──────
n:                1
q:                1
s:                5
flush:            14:43:52
prev:             -
rows:             85096
data:             2.62 MiB
sample_query:     INSERT INTO signoz_metrics.distributed_samples_v4 (env, temporality, metric_name, fingerprint, unix_milli, value, flags) FORMAT Native
sample_settings:  {'min_insert_block_size_rows':'1000000','min_insert_block_size_bytes':'20971520','min_insert_block_size_rows_for_materialized_views':'1000000','min_insert_block_size_bytes_for_materialized_views':'20971520','load_balancing':'random','max_execution_time':'759','timeout_before_checking_execution_speed':'0','max_memory_usage':'10000000000','async_insert':'1','wait_for_async_insert':'1','wait_for_async_insert_timeout':'720','async_insert_max_data_size':'204857600','async_insert_busy_timeout_min_ms':'120000','async_insert_busy_timeout_max_ms':'600000'}
query_ids:        ['3feab59c-9ce2-4092-96fb-d8083ede1dd0']
formats:          ['Native']
users:            ['default']
client_names:     ['clickhouse-go/2.36.0 (lv:go/1.23.12; os:linux)']
http_user_agents: ['']

Row 2:
──────
n:                1
q:                1
s:                6
flush:            14:43:52
prev:             -
rows:             81417
data:             2.51 MiB
sample_query:     INSERT INTO signoz_metrics.distributed_samples_v4 (env, temporality, metric_name, fingerprint, unix_milli, value, flags) FORMAT Native
sample_settings:  {'min_insert_block_size_rows':'1000000','min_insert_block_size_bytes':'20971520','min_insert_block_size_rows_for_materialized_views':'1000000','min_insert_block_size_bytes_for_materialized_views':'20971520','load_balancing':'random','max_execution_time':'572','timeout_before_checking_execution_speed':'0','max_memory_usage':'10000000000','async_insert':'1','wait_for_async_insert':'1','wait_for_async_insert_timeout':'720','async_insert_max_data_size':'204857600','async_insert_busy_timeout_min_ms':'120000','async_insert_busy_timeout_max_ms':'600000'}
query_ids:        ['ca235e27-f4e6-42b0-be8d-7558926362eb']
formats:          ['Native']
users:            ['default']
client_names:     ['clickhouse-go/2.36.0 (lv:go/1.23.12; os:linux)']
http_user_agents: ['']

• However, they ended up in different buffers because their

Settings

differ:

Copy code

Row 1: settings include "max_execution_time": "759"
Row 2: settings include "max_execution_time": "572"

From the clickhouse docs I understand that async inserts are grouped by query shape + settings. Since the settings differ, ClickHouse treats them as separate buffers. The confusing part is: we are not explicitly setting
max_execution_time
anywhere. We’re sending data from the SigNoz collector to a ClickHouse Distributed table. Question: Where could these different

max_execution_time

values be coming from, and how can we ensure inserts land in the same buffer so batching works as expected?

Srikanth Chekuri

09/19/2025, 11:11 AM

Not everything needs to be chatgpt-fied. I don't even follow what you are asking for and what the issue because it's all over the place.

Mohit Goyal

09/19/2025, 11:34 AM

@Srikanth Chekuri We’ve noticed that inserts from the SigNoz collector into ClickHouse Distributed tables are ending up in separate async insert buffers, even though the queries are identical. Looking at

system.asynchronous_insert_log

, the only difference is that

max_execution_time

varies between inserts (e.g., 572 vs 759). Since ClickHouse batches async inserts by query text and settings, this prevents batching from working as expected. • Where is the

max_execution_time

setting coming from in the SigNoz collector → ClickHouse pipeline we are not setting it anywhere explicitly? • Can we configure SigNoz to use a fixed

max_execution_time

(or disable it) so that all inserts land in the same buffer and batch properly? We have already confirmed with clickhouse they dont set any such setting by default. can you please help.

Srikanth Chekuri

09/19/2025, 11:36 AM

Did you enable the async inserts on ClickHouse?

Srikanth Chekuri

09/19/2025, 11:46 AM

I need more details about the setup. Standard SigNoz setup doesn't have async inserts enabled and does the batching at the collector. What is the collector configuration?

• Where is the
max_execution_time
setting coming from in the SigNoz collector → ClickHouse pipeline we are not setting it anywhere explicitly?

https://github.com/ClickHouse/clickhouse-go/blob/5f4a3ccd69e2597d4daecc20e4509423de5ef4e2/context.go#L221 based on the deadline from the collector exporter pipeline.

Mohit Goyal

09/19/2025, 11:58 AM

Question asked : Did you enable the async inserts on ClickHouse? Ans : Yes , Async inserts is a clickhouse setting and we have enabled it there. More details about ore signoz collector setup are as below: we are using below config

Copy code

receivers:
  # Keep only the receivers used by the metrics pipeline
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
        max_recv_msg_size_mib: 16
      http:
        endpoint: 0.0.0.0:4318

  signozkafkareceiver/metrics:
    topic: otel-metrics-4
    brokers: ["otel-metrics-collector-kafka-4.central-prod.local:9092"]
    client_id: otel-collector-v1
    group_id: otel-collector-v1
    metadata:
      retry:
        max: 10
        backoff: 5s
    sarama_consumer_config:
      fetch_min_bytes: 16777216
      fetch_default_bytes: 67108864
      fetch_max_bytes: 134217728
      max_processing_time: 240s
      messages_channel_size: 65536
      consumer_group_session_timeout: 240s

  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 10s
          static_configs:
            - targets: ['0.0.0.0:8888']

processors:
  batch/metrics:
    send_batch_size: 400000
    timeout: 180s

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  zpages:
    endpoint: localhost:55679
  pprof:
    endpoint: localhost:1777

exporters:
  # Primary ClickHouse cluster (existing)
  signozclickhousemetrics:
    dsn: "<tcp://x.x.x.x:9000>,x.x.x.x:9000,x.x.x.x:9000,x.x.x.x.168:9000,x.x.x.x:9000,x.x.x.x:9000/signoz_metrics?password=qwswe&max_execution_time=3600"
    timeout: 900s
    sending_queue:
      enabled: true
      num_consumers: 3
      queue_size: 10
    retry_on_failure:
      enabled: true
      initial_interval: 1s
      max_interval: 30s
      max_elapsed_time: 0s

  # Secondary ClickHouse cluster (NEW)
  signozclickhousemetrics/secondary:
    dsn: "<tcp://x.x.x>.:9000,x.x.x.x:9000,x.x.x.x:9000/signoz_metrics?password=qswsq&max_execution_time=3600"
    timeout: 900s
    sending_queue:
      enabled: true
      num_consumers: 4
      queue_size: 10
    retry_on_failure:
      enabled: true
      initial_interval: 1s
      max_interval: 30s
      max_elapsed_time: 0s

  # S3 mirror of the exact batches sent to ClickHouse (no compression)
  awss3:
    s3uploader:
      region : us-east-1
      s3_bucket: signoz-otel
      s3_prefix: telemetry/metrics
      s3_partition_format: "year=%Y/month=%m/day=%d/hour=%H/minute=%M"
    marshaler: otlp_json
    timeout: 900s
    sending_queue:
      enabled: true
      num_consumers: 1
      queue_size: 10

service:
  telemetry:
    logs:
      level: debug
      encoding: console

  extensions: [health_check, zpages, pprof]
  pipelines:
    metrics:
      receivers: [otlp, signozkafkareceiver/metrics]
      processors: [batch/metrics]
      exporters: [signozclickhousemetrics, signozclickhousemetrics/secondary, awss3]

Mohit Goyal

09/19/2025, 11:59 AM

please let me know if any other details are required.

Parth Ingole

09/19/2025, 12:00 PM

@Srikanth Chekuri what about the mv_separate_attrs tables? which get created via migration script and having same ddl as as mv for time_series_v4?

Srikanth Chekuri

09/19/2025, 12:03 PM

In the same migration script, there are also migrations that drop the redundant mvs. They were needed for historical reasons and they don't exist any more. As a rule of thumb we don't modify the migration retroactively. You can confirm that the redundant mv are not there by looking at the tables in signoz_metrics database.

Srikanth Chekuri

09/19/2025, 12:06 PM

timeout: 900s

As you can note from the shared snippet on the ClickHouse go client repo, the max_execution_time is set up by go driver lib based on the remaining deadline. It changes for each batch insert as the remaining time could vary based on the work done for it. How many collectors do you run? Is there a specific reason why you enabled async inserts on ClickHouse?

Mohit Goyal

09/19/2025, 12:12 PM

we wanted to create large parts as with inserts by multiple signo collector resultant parts size created in clickhouse were small which was impacting our clickhouse performance. So clickhouse says we should use async inserts to buffer data in memery so that we don't end up creating too many parts. Currently we are using only one instance of signoz collector.

Mohit Goyal

09/19/2025, 12:13 PM

is there any way we can send

max_execution_time

same for all queries send?

Mohit Goyal

09/19/2025, 12:13 PM

because without this we can't utilize async functionality

Srikanth Chekuri

09/19/2025, 12:18 PM

Currently we are using only one instance of signoz collector.

So clickhouse says we should use async inserts to buffer data in memery so that we don't end up creating too many parts

You are not creating too many parts with just one collector. With the

send_batch_size: 400000

and timeout of

timeout: 180s

you are nowhere creating many parts. You are far away worrying about the too many parts. Are you seeing the too many parts error with this config? You would already know ClickHouse says you can have the external batching and write in batches of 100k. That's what the collector does. Async insers is premature step IMO.

is there any way we can send
max_execution_time
same for all queries send?

No, it's dynamically modified by the clickhouse-go lib based on the deadline. The deadline in the context comes from the pipeline.

Srikanth Chekuri

09/19/2025, 12:20 PM

What is your scale in terms of samples/data points per second?

Mohit Goyal

09/19/2025, 12:30 PM

yes, With one collector we are creating large parts but we can't go in production with this setup as if this single collector instance fails then our pipeline fails and if we deploy mulitple instance then data size sent by each collectore instance decrease leading to too many parts issue. So we need to send
max_execution_time
same for queries fired at same time so that can be buffered otherwise we can't clickhouse async inserts with signoz collector.

We are sending cloud watch metrics in this pipeline so our ingestion rate in 3 -5 mb/sec

Srikanth Chekuri

09/19/2025, 12:34 PM

Don't look at the samples_v4 inserts size and assume that it's small. It's by design small. It just sends 3 floats + small string as a record. The time_series inserts won't be that small. We have severall OSS users who have run at a decent scale and they don't need to worry about the parts. If you could tell rough scale in terms of samples or data points per second. The collector already does the batching and ensures inserts are big. Your concern is valid but you need to asses before prematurely worrying about it.

Mohit Goyal

09/19/2025, 1:01 PM

since we write to distributed table and distributed table divides insert into futher among shards which futhert reduces insert size going on each shard. Coming back to problem even if some how we don't worry about number of parts created. With current setup of signoz collector where it is dynamically sending

max_execution_time,

async insert capability of clickhouse can never be used or implemented in practice since for queries to be buffered at clickhouse they should have same shape. Clickhouse async inserts are grouped by query shape + settings. is there any we could resolve this or disable that

sample_settings

to go along in insert query as below

Copy code

Row 2:
──────
n:                1
q:                1
s:                6
flush:            14:43:52
prev:             -
rows:             81417
data:             2.51 MiB
sample_query:     INSERT INTO signoz_metrics.distributed_samples_v4 (env, temporality, metric_name, fingerprint, unix_milli, value, flags) FORMAT Native
sample_settings:  {'min_insert_block_size_rows':'1000000','min_insert_block_size_bytes':'20971520','min_insert_block_size_rows_for_materialized_views':'1000000','min_insert_block_size_bytes_for_materialized_views':'20971520','load_balancing':'random',`max_execution_time`:'572','timeout_before_checking_execution_speed':'0','max_memory_usage':'10000000000','async_insert':'1','wait_for_async_insert':'1','wait_for_async_insert_timeout':'720','async_insert_max_data_size':'204857600','async_insert_busy_timeout_min_ms':'120000','async_insert_busy_timeout_max_ms':'600000'}
query_ids:        ['ca235e27-f4e6-42b0-be8d-7558926362eb']
formats:          ['Native']
users:            ['default']
client_names:     ['clic

Srikanth Chekuri

09/19/2025, 1:04 PM

is htere nay we could reolve that or disable that setting to go along in insert query

The context exporter receives is parent context by otel collector pipeline + and the setting is set up by clickhouse-go here https://github.com/ClickHouse/clickhouse-go/blob/5f4a3ccd69e2597d4daecc20e4509423de5ef4e2/context.go#L221

Parth Ingole

09/19/2025, 1:41 PM

@Srikanth Chekuri While sending data from the SigNoz collector to ClickHouse the collector memory eventually reaches 100% because INSERTs are throttled and dont complete. We see a large merge backlog and a growing replication queue. We enabled the sending queue

num_consumers

, `queue_size`but it still fills up quickly. To mitigate we enabled asynchronous inserts to buffer and batch larger writes aiming to create fewer parts and reduce merge pressure. Kafka ingest is ~20 MB/s. Cluster: 3 shards × 3 replicas. Running on a 64-vCPU machine cc @Mohit Goyal

Srikanth Chekuri

09/19/2025, 1:42 PM

What is the disk throughput?

Srikanth Chekuri

09/19/2025, 1:47 PM

We see a large merge backlog

The disk throughput is important factor here.

Mohit Goyal

09/19/2025, 1:53 PM

Screenshot 2025-09-19 at 7.23.47 PM.png

Srikanth Chekuri

09/19/2025, 1:54 PM

I mean the disk throughput capacity

Srikanth Chekuri

09/19/2025, 1:56 PM

We recommend gp3 with 500mb/s if you are on AWS

Parth Ingole

09/19/2025, 1:58 PM

135726 ip-10-26-20-15 signoz-otel-collector[27708]: 2025-09-19T135726.231Z info internal/retry_sender.go:133 Exporting failed. Will retry the request after interval. {"resource": {"service.instance.id": "53a05ae8-f702-45fe-ba19-ceae2c687ca5", "service.name": "/opt/signoz-otel-collector/bin/signoz-otel-collector", "service.version": "dev"}, "otelcol.component.id": "signozclickhousemetrics", "otelcol.component.kind": "exporter", "otelcol.signal": "metrics", "error": "code: 252, message: Too many parts (409 with average size of 2.43 MiB) in table 'signoz_metrics.samples_v4_agg_30m (a7e12b8f-04cc-4378-bad5-f9b8ab03cda8)'. Merges are processing significantly slower than inserts: while pushing to view signoz_metrics.samples_v4_agg_30m_mv: while pushing to view signoz_metrics.samples_v4_agg_5m_mv\ncode: 242, message: Table is in readonly mode: replica_path=/clickhouse/tables/740d9cd4-713a-48b6-b022-a7cb593e2e58/01/replicas/01: while pushing to view signoz_metrics.time_series_v4_1day_mv: while pushing to view signoz_metrics.time_series_v4_6hrs_mv", "interval": "1.220000092s"}

Srikanth Chekuri

09/19/2025, 1:59 PM

Merges are processing significantly slower than inserts

Right, here we can see merge rate is less than ingest rate. There are only two reasons for this problem 1. too many inserts (small or big) 2. Irrespective of the ingest, the merge is really slow. In this case, i believe the merge is slow

Mohit Goyal

09/19/2025, 1:59 PM

we r using gp3 volume (250 GiB, 3000 IOPS), AWS gives you 125 MB/s throughput.

Srikanth Chekuri

09/19/2025, 2:02 PM

What is the CPU usage of ClickHouse? If the CPU is not the bottleneck, then the next thing for slow merges is disk throughput.

Mohit Goyal

09/19/2025, 2:05 PM

intially we our cpu remians low but as we keep on ingesting data number of parts increases so does number merges which makes our cpu usage further higher and are merges kind of getting stuck in loop. 0 cpu in between shows that we we stopped our cluster during that time.

Srikanth Chekuri

09/19/2025, 2:05 PM

code: 252, message: Too many parts (409 with average size of 2.43 MiB

On samples table this usually means 80k-100k rows. The nature of the rows in samples_v4 table is that they are going to be around the 2-5mb.

Srikanth Chekuri

09/19/2025, 2:06 PM

How are shards distributed? Are they all on same machine? If the CPU is bottleneck then we need to address it.

Mohit Goyal

09/19/2025, 2:07 PM

we are using 3 shard 1 replica cluster with c7gn.16xlarge machine

Srikanth Chekuri

09/19/2025, 2:08 PM

c7gn.16xlarge machine Does this host all shards or each shard have their own machine?

Srikanth Chekuri

09/19/2025, 2:09 PM

You can also share the output of

Copy code

SELECT
    table,
    round((elapsed * (1 / progress)) - elapsed, 2) AS estimate,
    elapsed,
    progress,
    is_mutation,
    formatReadableSize(total_size_bytes_compressed) AS size,
    formatReadableSize(memory_usage) AS mem
FROM system.merges
ORDER BY elapsed DESC

Mohit Goyal

09/19/2025, 2:16 PM

Copy code

Query id: 460b2d5c-5cce-491a-b3e0-93dd69dd2169

    ┌─table───────────────────┬─estimate─┬──────elapsed─┬────────────progress─┬─is_mutation─┬─size───────┬─mem────────┐
 1. │ query_metric_log        │      9.8 │ 10.315822924 │  0.5128530869248256 │           0 │ 21.38 MiB  │ 53.80 MiB  │
 2. │ asynchronous_metric_log │     4.18 │  4.953782157 │  0.5426196177179311 │           0 │ 9.91 MiB   │ 7.75 MiB   │
 3. │ query_log               │        0 │  0.231752519 │                   1 │           0 │ 480.11 KiB │ 3.66 MiB   │
 4. │ query_log               │        0 │  0.194859372 │                   1 │           0 │ 57.53 KiB  │ 3.34 MiB   │
 5. │ time_series_v4_6hrs     │      0.6 │  0.182080012 │ 0.23417296389588582 │           0 │ 29.06 MiB  │ 165.32 MiB │
 6. │ time_series_v4_6hrs     │     0.43 │  0.182014275 │    0.29895986433013 │           0 │ 12.30 MiB  │ 92.94 MiB  │
 7. │ samples_v4_agg_30m      │     0.51 │  0.181996317 │ 0.26222833263177436 │           0 │ 8.66 MiB   │ 14.39 MiB  │
 8. │ samples_v4_agg_5m       │     2.03 │  0.181970355 │ 0.08225509021925873 │           0 │ 35.98 MiB  │ 31.82 MiB  │
 9. │ trace_log               │        0 │  0.174009491 │                   1 │           0 │ 41.38 KiB  │ 3.61 MiB   │
10. │ time_series_v4_6hrs     │     0.39 │  0.173860384 │ 0.30726608020107393 │           0 │ 12.22 MiB  │ 97.01 MiB  │
11. │ samples_v4_agg_5m       │     3.38 │  0.173742821 │ 0.04885672538024141 │           0 │ 34.46 MiB  │ 18.57 MiB  │
12. │ time_series_v4_6hrs     │     0.33 │  0.169633093 │ 0.33871273071462377 │           0 │ 11.47 MiB  │ 101.32 MiB │
13. │ processors_profile_log  │        0 │  0.169610446 │                   1 │           0 │ 23.31 KiB  │ 3.36 MiB   │
14. │ samples_v4_agg_30m      │     0.54 │  0.169583701 │  0.2395942418479122 │           0 │ 9.81 MiB   │ 17.87 MiB  │
15. │ samples_v4_agg_5m       │     1.48 │  0.169538103 │ 0.10297378179374496 │           0 │ 17.66 MiB  │ 16.96 MiB  │
16. │ samples_v4_agg_30m      │     0.54 │  0.169520711 │ 0.24049555235886447 │           0 │ 8.37 MiB   │ 14.31 MiB  │
17. │ time_series_v4_6hrs     │     0.36 │  0.169493521 │ 0.32001352918438347 │           0 │ 16.99 MiB  │ 127.94 MiB │
18. │ samples_v4_agg_30m      │     1.11 │  0.169464373 │ 0.13277566144199143 │           0 │ 18.88 MiB  │ 27.81 MiB  │
19. │ time_series_v4_6hrs     │        0 │  0.169396741 │                   1 │           0 │ 1.97 MiB   │ 67.32 MiB  │
20. │ part_log                │        0 │  0.167785679 │                   1 │           0 │ 22.33 KiB  │ 3.25 MiB   │
21. │ samples_v4_agg_5m       │     0.94 │  0.167639189 │ 0.15067643571231093 │           0 │ 10.34 MiB  │ 13.94 MiB  │
22. │ time_series_v4_6hrs     │     0.15 │  0.166616236 │  0.5293330955777461 │           0 │ 6.03 MiB   │ 89.23 MiB  │
23. │ text_log                │        0 │  0.117969416 │                   1 │           0 │ 30.54 KiB  │ 3.44 MiB   │
24. │ query_log               │        0 │  0.112072136 │                   1 │           0 │ 59.59 KiB  │ 3.54 MiB   │
25. │ trace_log               │        0 │  0.104456723 │                   1 │           0 │ 19.58 KiB  │ 3.31 MiB   │
26. │ processors_profile_log  │        0 │  0.058545622 │                   1 │           0 │ 24.86 KiB  │ 3.36 MiB   │
27. │ error_log               │        0 │  0.056690336 │                   1 │           0 │ 9.33 KiB   │ 6.20 MiB   │
28. │ latency_log             │     0.06 │  0.056657594 │  0.4921322273725299 │           0 │ 1.36 MiB   │ 40.98 MiB  │
29. │ latency_log             │        0 │  0.056644843 │                   1 │           0 │ 5.41 KiB   │ 3.24 MiB   │
30. │ time_series_v4          │     0.54 │  0.056574001 │ 0.09526573193949989 │           0 │ 52.39 MiB  │ 44.49 MiB  │
31. │ time_series_v4          │      0.6 │   0.05647031 │ 0.08650576850425735 │           0 │ 47.64 MiB  │ 50.28 MiB  │
32. │ time_series_v4          │      inf │  0.056422828 │                   0 │           0 │ 56.91 MiB  │ 13.16 MiB  │
    └─────────────────────────┴──────────┴──────────────┴─────────────────────┴─────────────┴────────────┴────────────┘

Mohit Goyal

09/19/2025, 2:17 PM

each shard have 1 node

Srikanth Chekuri

09/19/2025, 2:21 PM

Can you check this on the node which has cpu peaked

Copy code

SELECT
    normalizedQueryHash(query) hash,
    current_database,
    sum(ProfileEvents['UserTimeMicroseconds'] as userCPUq)/1000 AS userCPUms,
    count(),
    sum(query_duration_ms) query_duration_ms,
    userCPUms/query_duration_ms cpu_per_sec, 
    argMax(query, userCPUq) heaviest_query
FROM system.query_log
WHERE (type = 2) AND (event_time >= now() - INTERVAL 3 HOUR)
GROUP BY
    current_database,
    hash
ORDER BY userCPUms DESC
LIMIT 10
FORMAT Vertical;

Mohit Goyal

09/19/2025, 2:22 PM

Untitled

Srikanth Chekuri

09/19/2025, 2:24 PM

It shows CPU usage is because of the selects. Are you using promql alerts?

Mohit Goyal

09/19/2025, 2:28 PM

Right now yes, but issue that we are facing occurs even before we set any promql alerts

Mohit Goyal

09/19/2025, 2:29 PM

we saw our cpu spikes as when our number of active merge count increase

Mohit Goyal

09/19/2025, 2:30 PM

Screenshot 2025-09-19 at 8.00.52 PM.png

Srikanth Chekuri

09/19/2025, 2:31 PM

Ok, there is a CPU contention. That is the first thing to adress because number of things are contending for same resource, inserts, selects, merged and some house keeping work. Yes, with the merge increase there is going to be spike in CPU which is expected.

Mohit Goyal

09/19/2025, 2:34 PM

But we did on same other other cluster where we dont have read traffic : • Data ingestion works fine initially, but after some time the number of active merges keeps increasing,. • Looking at the logs, I see the same merge being retried multiple times, failing with:

Copy code

Code: 236. DB::Exception: Cancelled merging parts: While executing MergeTreeSequentialSource. (ABORTED)

• This keeps repeating, so merges never complete and pile up. Has anyone seen this before? What could cause merges to be repeatedly cancelled and retried like this? Are there known settings (e.g. background pool limits, memory/disk throttling, ZooKeeper/Keeper timeouts) that can lead to this behavior?

Srikanth Chekuri

09/19/2025, 2:38 PM

The default config uses 16 threads to do the background work. It's very good default option and doesn't need tweaking. Please share the full message from the logs. Two sources of major bottlenecks for ClickHouse 1. cpu 2. disk throughput I recommend having min 500mb/s

Rohit Pandit

09/19/2025, 2:41 PM

@Srikanth Chekuri In the screenshot we shared for disk throughput, we can see that max 15K Kbps is used and max throughput of disk is 125MBps. Then what is the point of using 500MBps disk?

Mohit Goyal

09/19/2025, 2:41 PM

we are using below settings

Copy code

<merge_tree>
      <min_bytes_for_wide_part>1073741824</min_bytes_for_wide_part>
      <min_rows_for_wide_part>1000000</min_rows_for_wide_part>
      <max_bytes_to_merge_at_min_space_in_pool>134217728</max_bytes_to_merge_at_min_space_in_pool>
      <max_bytes_to_merge_at_max_space_in_pool>4589934592</max_bytes_to_merge_at_max_space_in_pool>
    </merge_tree>

    <background_pool_size>16</background_pool_size>
    <background_fetches_pool_size>48</background_fetches_pool_size>
    <async_insert_threads>48</async_insert_threads>

stack trace

Copy code

Code: 236. DB::Exception: Cancelled merging parts: While executing MergeTreeSequentialSource. (ABORTED), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000dad8c08
1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000911a81c
2. DB::Exception::Exception<>(int, FormatStringHelperImpl<>) @ 0x0000000009125a94
3. DB::MergeTask::GlobalRuntimeContext::checkOperationIsNotCanceled() const @ 0x00000000123d42c0
4. DB::MergeProgressCallback::operator()(DB::Progress const&) @ 0x0000000012402c98
5. DB::ReadProgressCallback::onProgress(unsigned long, unsigned long, std::list<DB::StorageLimits, std::allocator<DB::StorageLimits>> const&) @ 0x00000000109484b8
6. DB::ExecutionThreadContext::executeTask() @ 0x0000000012b06174
7. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x0000000012afbb80
8. DB::PipelineExecutor::executeStep(std::atomic<bool>*) @ 0x0000000012afb114
9. DB::PullingPipelineExecutor::pull(DB::Chunk&) @ 0x0000000012b0b870
10. DB::PullingPipelineExecutor::pull(DB::Block&) @ 0x0000000012b0babc
11. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::executeImpl() const @ 0x00000000123e4028
12. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::execute() @ 0x00000000123e383c
13. DB::MergeTask::execute() @ 0x00000000123ebc9c
14. DB::ReplicatedMergeMutateTaskBase::executeStep() @ 0x00000000126bd294
15. DB::MergeTreeBackgroundExecutor<DB::DynamicRuntimeQueue>::threadFunction() @ 0x00000000124153f8
16. ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::worker() @ 0x000000000dbfbd04
17. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x000000000dc0191c
18. ThreadPoolImpl<std::thread>::ThreadFromThreadPool::worker() @ 0x000000000dbf9474
19. void* std::__thread_proxy[abi:ne190107]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x000000000dbffd5c
20. ? @ 0x000000000008595c
21. ? @ 0x00000000000eba4c
 (version 25.6.2.5 (official build))

5 Views

Open in Slack

Previous Next