Parth Ingole
09/18/2025, 5:51 PMtime_series_v4_6hrs_mv_separate_attrs
- time_series_v4_1day_mv_separate_attrs
- time_series_v4_1week_mv_separate_attrs
These are defined in metrics_migrations.go
but not used by any collector components.
# Questions
1. Are these materialized views still needed by SigNoz frontend/query service?
2. Are they removed from latest migration binary for performance reasons?
3. Can we safely drop them to improve merge performance?
4. Should they be removed from schema migrations too?
# Impact
Each insert cascades through multiple materialized views, multiplying merge operations and causing queue buildup
--------------------------------------------------------------------------------------
Title: Issue we’re observing is that queries arriving at the same time are not being batched into the same async insert buffer.
Here’s a sample from `system.asynchronous_insert_log`:
• Two inserts into the same table at the same second (14:43:52
)
• Both queries are identical (INSERT INTO signoz_metrics.distributed_samples_v4 … FORMAT Native
)
Row 1:
──────
n: 1
q: 1
s: 5
flush: 14:43:52
prev: -
rows: 85096
data: 2.62 MiB
sample_query: INSERT INTO signoz_metrics.distributed_samples_v4 (env, temporality, metric_name, fingerprint, unix_milli, value, flags) FORMAT Native
sample_settings: {'min_insert_block_size_rows':'1000000','min_insert_block_size_bytes':'20971520','min_insert_block_size_rows_for_materialized_views':'1000000','min_insert_block_size_bytes_for_materialized_views':'20971520','load_balancing':'random','max_execution_time':'759','timeout_before_checking_execution_speed':'0','max_memory_usage':'10000000000','async_insert':'1','wait_for_async_insert':'1','wait_for_async_insert_timeout':'720','async_insert_max_data_size':'204857600','async_insert_busy_timeout_min_ms':'120000','async_insert_busy_timeout_max_ms':'600000'}
query_ids: ['3feab59c-9ce2-4092-96fb-d8083ede1dd0']
formats: ['Native']
users: ['default']
client_names: ['clickhouse-go/2.36.0 (lv:go/1.23.12; os:linux)']
http_user_agents: ['']
Row 2:
──────
n: 1
q: 1
s: 6
flush: 14:43:52
prev: -
rows: 81417
data: 2.51 MiB
sample_query: INSERT INTO signoz_metrics.distributed_samples_v4 (env, temporality, metric_name, fingerprint, unix_milli, value, flags) FORMAT Native
sample_settings: {'min_insert_block_size_rows':'1000000','min_insert_block_size_bytes':'20971520','min_insert_block_size_rows_for_materialized_views':'1000000','min_insert_block_size_bytes_for_materialized_views':'20971520','load_balancing':'random','max_execution_time':'572','timeout_before_checking_execution_speed':'0','max_memory_usage':'10000000000','async_insert':'1','wait_for_async_insert':'1','wait_for_async_insert_timeout':'720','async_insert_max_data_size':'204857600','async_insert_busy_timeout_min_ms':'120000','async_insert_busy_timeout_max_ms':'600000'}
query_ids: ['ca235e27-f4e6-42b0-be8d-7558926362eb']
formats: ['Native']
users: ['default']
client_names: ['clickhouse-go/2.36.0 (lv:go/1.23.12; os:linux)']
http_user_agents: ['']
• However, they ended up in different buffers because their Settings
differ:
Row 1: settings include "max_execution_time": "759"
Row 2: settings include "max_execution_time": "572"
From the clickhouse docs I understand that async inserts are grouped by query shape + settings. Since the settings differ, ClickHouse treats them as separate buffers.
The confusing part is: we are not explicitly setting max_execution_time
anywhere.
We’re sending data from the SigNoz collector to a ClickHouse Distributed table.
Question:
Where could these different max_execution_time
values be coming from, and how can we ensure inserts land in the same buffer so batching works as expected?Srikanth Chekuri
09/19/2025, 11:11 AMMohit Goyal
09/19/2025, 11:34 AMsystem.asynchronous_insert_log
, the only difference is that max_execution_time
varies between inserts (e.g., 572 vs 759).
Since ClickHouse batches async inserts by query text and settings, this prevents batching from working as expected.
• Where is the max_execution_time
setting coming from in the SigNoz collector → ClickHouse pipeline we are not setting it anywhere explicitly?
• Can we configure SigNoz to use a fixed max_execution_time
(or disable it) so that all inserts land in the same buffer and batch properly?
We have already confirmed with clickhouse they dont set any such setting by default. can you please help.Srikanth Chekuri
09/19/2025, 11:36 AMSrikanth Chekuri
09/19/2025, 11:46 AM• Where is thehttps://github.com/ClickHouse/clickhouse-go/blob/5f4a3ccd69e2597d4daecc20e4509423de5ef4e2/context.go#L221 based on the deadline from the collector exporter pipeline.setting coming from in the SigNoz collector → ClickHouse pipeline we are not setting it anywhere explicitly?max_execution_time
Mohit Goyal
09/19/2025, 11:58 AMreceivers:
# Keep only the receivers used by the metrics pipeline
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
max_recv_msg_size_mib: 16
http:
endpoint: 0.0.0.0:4318
signozkafkareceiver/metrics:
topic: otel-metrics-4
brokers: ["otel-metrics-collector-kafka-4.central-prod.local:9092"]
client_id: otel-collector-v1
group_id: otel-collector-v1
metadata:
retry:
max: 10
backoff: 5s
sarama_consumer_config:
fetch_min_bytes: 16777216
fetch_default_bytes: 67108864
fetch_max_bytes: 134217728
max_processing_time: 240s
messages_channel_size: 65536
consumer_group_session_timeout: 240s
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 10s
static_configs:
- targets: ['0.0.0.0:8888']
processors:
batch/metrics:
send_batch_size: 400000
timeout: 180s
extensions:
health_check:
endpoint: 0.0.0.0:13133
zpages:
endpoint: localhost:55679
pprof:
endpoint: localhost:1777
exporters:
# Primary ClickHouse cluster (existing)
signozclickhousemetrics:
dsn: "<tcp://x.x.x.x:9000>,x.x.x.x:9000,x.x.x.x:9000,x.x.x.x.168:9000,x.x.x.x:9000,x.x.x.x:9000/signoz_metrics?password=qwswe&max_execution_time=3600"
timeout: 900s
sending_queue:
enabled: true
num_consumers: 3
queue_size: 10
retry_on_failure:
enabled: true
initial_interval: 1s
max_interval: 30s
max_elapsed_time: 0s
# Secondary ClickHouse cluster (NEW)
signozclickhousemetrics/secondary:
dsn: "<tcp://x.x.x>.:9000,x.x.x.x:9000,x.x.x.x:9000/signoz_metrics?password=qswsq&max_execution_time=3600"
timeout: 900s
sending_queue:
enabled: true
num_consumers: 4
queue_size: 10
retry_on_failure:
enabled: true
initial_interval: 1s
max_interval: 30s
max_elapsed_time: 0s
# S3 mirror of the exact batches sent to ClickHouse (no compression)
awss3:
s3uploader:
region : us-east-1
s3_bucket: signoz-otel
s3_prefix: telemetry/metrics
s3_partition_format: "year=%Y/month=%m/day=%d/hour=%H/minute=%M"
marshaler: otlp_json
timeout: 900s
sending_queue:
enabled: true
num_consumers: 1
queue_size: 10
service:
telemetry:
logs:
level: debug
encoding: console
extensions: [health_check, zpages, pprof]
pipelines:
metrics:
receivers: [otlp, signozkafkareceiver/metrics]
processors: [batch/metrics]
exporters: [signozclickhousemetrics, signozclickhousemetrics/secondary, awss3]
Mohit Goyal
09/19/2025, 11:59 AMParth Ingole
09/19/2025, 12:00 PMSrikanth Chekuri
09/19/2025, 12:03 PMSrikanth Chekuri
09/19/2025, 12:06 PMtimeout: 900sAs you can note from the shared snippet on the ClickHouse go client repo, the max_execution_time is set up by go driver lib based on the remaining deadline. It changes for each batch insert as the remaining time could vary based on the work done for it. How many collectors do you run? Is there a specific reason why you enabled async inserts on ClickHouse?
Mohit Goyal
09/19/2025, 12:12 PMMohit Goyal
09/19/2025, 12:13 PMmax_execution_time
same for all queries send?Mohit Goyal
09/19/2025, 12:13 PMSrikanth Chekuri
09/19/2025, 12:18 PMCurrently we are using only one instance of signoz collector.
So clickhouse says we should use async inserts to buffer data in memery so that we don't end up creating too many partsYou are not creating too many parts with just one collector. With the
send_batch_size: 400000
and timeout of timeout: 180s
you are nowhere creating many parts. You are far away worrying about the too many parts. Are you seeing the too many parts error with this config?
You would already know ClickHouse says you can have the external batching and write in batches of 100k. That's what the collector does. Async insers is premature step IMO.
is there any way we can sendNo, it's dynamically modified by the clickhouse-go lib based on the deadline. The deadline in the context comes from the pipeline.same for all queries send?max_execution_time
Srikanth Chekuri
09/19/2025, 12:20 PMMohit Goyal
09/19/2025, 12:30 PMyes, With one collector we are creating large parts but we can't go in production with this setup as if this single collector instance fails then our pipeline fails and if we deploy mulitple instance then data size sent by each collectore instance decrease leading to too many parts issue. So we need to sendWe are sending cloud watch metrics in this pipeline so our ingestion rate in 3 -5 mb/secsame for queries fired at same time so that can be buffered otherwise we can't clickhouse async inserts with signoz collector.max_execution_time
Srikanth Chekuri
09/19/2025, 12:34 PMMohit Goyal
09/19/2025, 1:01 PMmax_execution_time,
async insert capability of clickhouse can never be used or implemented in practice since for queries to be buffered at clickhouse they should have same shape. Clickhouse async inserts are grouped by query shape + settings.
is there any we could resolve this or disable that sample_settings
to go along in insert query as below
Row 2:
──────
n: 1
q: 1
s: 6
flush: 14:43:52
prev: -
rows: 81417
data: 2.51 MiB
sample_query: INSERT INTO signoz_metrics.distributed_samples_v4 (env, temporality, metric_name, fingerprint, unix_milli, value, flags) FORMAT Native
sample_settings: {'min_insert_block_size_rows':'1000000','min_insert_block_size_bytes':'20971520','min_insert_block_size_rows_for_materialized_views':'1000000','min_insert_block_size_bytes_for_materialized_views':'20971520','load_balancing':'random',`max_execution_time`:'572','timeout_before_checking_execution_speed':'0','max_memory_usage':'10000000000','async_insert':'1','wait_for_async_insert':'1','wait_for_async_insert_timeout':'720','async_insert_max_data_size':'204857600','async_insert_busy_timeout_min_ms':'120000','async_insert_busy_timeout_max_ms':'600000'}
query_ids: ['ca235e27-f4e6-42b0-be8d-7558926362eb']
formats: ['Native']
users: ['default']
client_names: ['clic
Srikanth Chekuri
09/19/2025, 1:04 PMis htere nay we could reolve that or disable that setting to go along in insert queryThe context exporter receives is parent context by otel collector pipeline + and the setting is set up by clickhouse-go here https://github.com/ClickHouse/clickhouse-go/blob/5f4a3ccd69e2597d4daecc20e4509423de5ef4e2/context.go#L221
Parth Ingole
09/19/2025, 1:41 PMnum_consumers
, `queue_size`but it still fills up quickly. To mitigate we enabled asynchronous inserts to buffer and batch larger writes aiming to create fewer parts and reduce merge pressure.
Kafka ingest is ~20 MB/s.
Cluster: 3 shards × 3 replicas. Running on a 64-vCPU machine
cc @Mohit GoyalSrikanth Chekuri
09/19/2025, 1:42 PMSrikanth Chekuri
09/19/2025, 1:47 PMWe see a large merge backlogThe disk throughput is important factor here.
Mohit Goyal
09/19/2025, 1:53 PMSrikanth Chekuri
09/19/2025, 1:54 PMSrikanth Chekuri
09/19/2025, 1:56 PMParth Ingole
09/19/2025, 1:58 PMSrikanth Chekuri
09/19/2025, 1:59 PMMerges are processing significantly slower than insertsRight, here we can see merge rate is less than ingest rate. There are only two reasons for this problem 1. too many inserts (small or big) 2. Irrespective of the ingest, the merge is really slow. In this case, i believe the merge is slow
Mohit Goyal
09/19/2025, 1:59 PMSrikanth Chekuri
09/19/2025, 2:02 PMMohit Goyal
09/19/2025, 2:05 PMSrikanth Chekuri
09/19/2025, 2:05 PMcode: 252, message: Too many parts (409 with average size of 2.43 MiBOn samples table this usually means 80k-100k rows. The nature of the rows in samples_v4 table is that they are going to be around the 2-5mb.
Srikanth Chekuri
09/19/2025, 2:06 PMMohit Goyal
09/19/2025, 2:07 PMSrikanth Chekuri
09/19/2025, 2:08 PMSrikanth Chekuri
09/19/2025, 2:09 PMSELECT
table,
round((elapsed * (1 / progress)) - elapsed, 2) AS estimate,
elapsed,
progress,
is_mutation,
formatReadableSize(total_size_bytes_compressed) AS size,
formatReadableSize(memory_usage) AS mem
FROM system.merges
ORDER BY elapsed DESC
Mohit Goyal
09/19/2025, 2:16 PMQuery id: 460b2d5c-5cce-491a-b3e0-93dd69dd2169
┌─table───────────────────┬─estimate─┬──────elapsed─┬────────────progress─┬─is_mutation─┬─size───────┬─mem────────┐
1. │ query_metric_log │ 9.8 │ 10.315822924 │ 0.5128530869248256 │ 0 │ 21.38 MiB │ 53.80 MiB │
2. │ asynchronous_metric_log │ 4.18 │ 4.953782157 │ 0.5426196177179311 │ 0 │ 9.91 MiB │ 7.75 MiB │
3. │ query_log │ 0 │ 0.231752519 │ 1 │ 0 │ 480.11 KiB │ 3.66 MiB │
4. │ query_log │ 0 │ 0.194859372 │ 1 │ 0 │ 57.53 KiB │ 3.34 MiB │
5. │ time_series_v4_6hrs │ 0.6 │ 0.182080012 │ 0.23417296389588582 │ 0 │ 29.06 MiB │ 165.32 MiB │
6. │ time_series_v4_6hrs │ 0.43 │ 0.182014275 │ 0.29895986433013 │ 0 │ 12.30 MiB │ 92.94 MiB │
7. │ samples_v4_agg_30m │ 0.51 │ 0.181996317 │ 0.26222833263177436 │ 0 │ 8.66 MiB │ 14.39 MiB │
8. │ samples_v4_agg_5m │ 2.03 │ 0.181970355 │ 0.08225509021925873 │ 0 │ 35.98 MiB │ 31.82 MiB │
9. │ trace_log │ 0 │ 0.174009491 │ 1 │ 0 │ 41.38 KiB │ 3.61 MiB │
10. │ time_series_v4_6hrs │ 0.39 │ 0.173860384 │ 0.30726608020107393 │ 0 │ 12.22 MiB │ 97.01 MiB │
11. │ samples_v4_agg_5m │ 3.38 │ 0.173742821 │ 0.04885672538024141 │ 0 │ 34.46 MiB │ 18.57 MiB │
12. │ time_series_v4_6hrs │ 0.33 │ 0.169633093 │ 0.33871273071462377 │ 0 │ 11.47 MiB │ 101.32 MiB │
13. │ processors_profile_log │ 0 │ 0.169610446 │ 1 │ 0 │ 23.31 KiB │ 3.36 MiB │
14. │ samples_v4_agg_30m │ 0.54 │ 0.169583701 │ 0.2395942418479122 │ 0 │ 9.81 MiB │ 17.87 MiB │
15. │ samples_v4_agg_5m │ 1.48 │ 0.169538103 │ 0.10297378179374496 │ 0 │ 17.66 MiB │ 16.96 MiB │
16. │ samples_v4_agg_30m │ 0.54 │ 0.169520711 │ 0.24049555235886447 │ 0 │ 8.37 MiB │ 14.31 MiB │
17. │ time_series_v4_6hrs │ 0.36 │ 0.169493521 │ 0.32001352918438347 │ 0 │ 16.99 MiB │ 127.94 MiB │
18. │ samples_v4_agg_30m │ 1.11 │ 0.169464373 │ 0.13277566144199143 │ 0 │ 18.88 MiB │ 27.81 MiB │
19. │ time_series_v4_6hrs │ 0 │ 0.169396741 │ 1 │ 0 │ 1.97 MiB │ 67.32 MiB │
20. │ part_log │ 0 │ 0.167785679 │ 1 │ 0 │ 22.33 KiB │ 3.25 MiB │
21. │ samples_v4_agg_5m │ 0.94 │ 0.167639189 │ 0.15067643571231093 │ 0 │ 10.34 MiB │ 13.94 MiB │
22. │ time_series_v4_6hrs │ 0.15 │ 0.166616236 │ 0.5293330955777461 │ 0 │ 6.03 MiB │ 89.23 MiB │
23. │ text_log │ 0 │ 0.117969416 │ 1 │ 0 │ 30.54 KiB │ 3.44 MiB │
24. │ query_log │ 0 │ 0.112072136 │ 1 │ 0 │ 59.59 KiB │ 3.54 MiB │
25. │ trace_log │ 0 │ 0.104456723 │ 1 │ 0 │ 19.58 KiB │ 3.31 MiB │
26. │ processors_profile_log │ 0 │ 0.058545622 │ 1 │ 0 │ 24.86 KiB │ 3.36 MiB │
27. │ error_log │ 0 │ 0.056690336 │ 1 │ 0 │ 9.33 KiB │ 6.20 MiB │
28. │ latency_log │ 0.06 │ 0.056657594 │ 0.4921322273725299 │ 0 │ 1.36 MiB │ 40.98 MiB │
29. │ latency_log │ 0 │ 0.056644843 │ 1 │ 0 │ 5.41 KiB │ 3.24 MiB │
30. │ time_series_v4 │ 0.54 │ 0.056574001 │ 0.09526573193949989 │ 0 │ 52.39 MiB │ 44.49 MiB │
31. │ time_series_v4 │ 0.6 │ 0.05647031 │ 0.08650576850425735 │ 0 │ 47.64 MiB │ 50.28 MiB │
32. │ time_series_v4 │ inf │ 0.056422828 │ 0 │ 0 │ 56.91 MiB │ 13.16 MiB │
└─────────────────────────┴──────────┴──────────────┴─────────────────────┴─────────────┴────────────┴────────────┘
Mohit Goyal
09/19/2025, 2:17 PMSrikanth Chekuri
09/19/2025, 2:21 PMSELECT
normalizedQueryHash(query) hash,
current_database,
sum(ProfileEvents['UserTimeMicroseconds'] as userCPUq)/1000 AS userCPUms,
count(),
sum(query_duration_ms) query_duration_ms,
userCPUms/query_duration_ms cpu_per_sec,
argMax(query, userCPUq) heaviest_query
FROM system.query_log
WHERE (type = 2) AND (event_time >= now() - INTERVAL 3 HOUR)
GROUP BY
current_database,
hash
ORDER BY userCPUms DESC
LIMIT 10
FORMAT Vertical;
Mohit Goyal
09/19/2025, 2:22 PMSrikanth Chekuri
09/19/2025, 2:24 PMMohit Goyal
09/19/2025, 2:28 PMMohit Goyal
09/19/2025, 2:29 PMMohit Goyal
09/19/2025, 2:30 PMSrikanth Chekuri
09/19/2025, 2:31 PMMohit Goyal
09/19/2025, 2:34 PMCode: 236. DB::Exception: Cancelled merging parts: While executing MergeTreeSequentialSource. (ABORTED)
• This keeps repeating, so merges never complete and pile up.
Has anyone seen this before? What could cause merges to be repeatedly cancelled and retried like this?
Are there known settings (e.g. background pool limits, memory/disk throttling, ZooKeeper/Keeper timeouts) that can lead to this behavior?Srikanth Chekuri
09/19/2025, 2:38 PMRohit Pandit
09/19/2025, 2:41 PMMohit Goyal
09/19/2025, 2:41 PM<merge_tree>
<min_bytes_for_wide_part>1073741824</min_bytes_for_wide_part>
<min_rows_for_wide_part>1000000</min_rows_for_wide_part>
<max_bytes_to_merge_at_min_space_in_pool>134217728</max_bytes_to_merge_at_min_space_in_pool>
<max_bytes_to_merge_at_max_space_in_pool>4589934592</max_bytes_to_merge_at_max_space_in_pool>
</merge_tree>
<background_pool_size>16</background_pool_size>
<background_fetches_pool_size>48</background_fetches_pool_size>
<async_insert_threads>48</async_insert_threads>
stack trace
Code: 236. DB::Exception: Cancelled merging parts: While executing MergeTreeSequentialSource. (ABORTED), Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000dad8c08
1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000911a81c
2. DB::Exception::Exception<>(int, FormatStringHelperImpl<>) @ 0x0000000009125a94
3. DB::MergeTask::GlobalRuntimeContext::checkOperationIsNotCanceled() const @ 0x00000000123d42c0
4. DB::MergeProgressCallback::operator()(DB::Progress const&) @ 0x0000000012402c98
5. DB::ReadProgressCallback::onProgress(unsigned long, unsigned long, std::list<DB::StorageLimits, std::allocator<DB::StorageLimits>> const&) @ 0x00000000109484b8
6. DB::ExecutionThreadContext::executeTask() @ 0x0000000012b06174
7. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x0000000012afbb80
8. DB::PipelineExecutor::executeStep(std::atomic<bool>*) @ 0x0000000012afb114
9. DB::PullingPipelineExecutor::pull(DB::Chunk&) @ 0x0000000012b0b870
10. DB::PullingPipelineExecutor::pull(DB::Block&) @ 0x0000000012b0babc
11. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::executeImpl() const @ 0x00000000123e4028
12. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::execute() @ 0x00000000123e383c
13. DB::MergeTask::execute() @ 0x00000000123ebc9c
14. DB::ReplicatedMergeMutateTaskBase::executeStep() @ 0x00000000126bd294
15. DB::MergeTreeBackgroundExecutor<DB::DynamicRuntimeQueue>::threadFunction() @ 0x00000000124153f8
16. ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::worker() @ 0x000000000dbfbd04
17. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x000000000dc0191c
18. ThreadPoolImpl<std::thread>::ThreadFromThreadPool::worker() @ 0x000000000dbf9474
19. void* std::__thread_proxy[abi:ne190107]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x000000000dbffd5c
20. ? @ 0x000000000008595c
21. ? @ 0x00000000000eba4c
(version 25.6.2.5 (official build))