Bobses
05/16/2025, 2:00 PMr6i.xlarge
node (4 vCPUs and 32 GiB RAM).
• ClickHouse uses a 512 GiB EBS volume and an S3 bucket for cold storage.
• Zookeeper has a 20 GiB volume.
• The signoz-db
component also uses a 20 GiB volume.
In addition, I have several other EKS clusters where the OpenTelemetry Collector is deployed as a DaemonSet, sending data to the signoz-otel-collector
. Applications running in those clusters also send traces, logs, and metrics directly to the signoz-otel-collector
from the first EKS cluster.
I've noticed that a single r6i.xlarge
node is not sufficient to handle SigNoz, the otel-collector, and ClickHouse altogether, so I’m considering the following improvement:
• One dedicated node for ClickHouse
• Another dedicated node for SigNoz and signoz-otel-collector
Do you have any other grouping recommendations? Or additional suggestions to ensure optimal performance?
Thank you!Bobses
05/19/2025, 12:10 PM"name":"clickhousemetricswrite","error":"code: 252, message: Too many parts (10001 with average size of 12.94 KiB) in table 'signoz_metrics.time_series_v4_1week (1ef177a4-4a8d-40d9-a260-30f04ed5ada4)'. Merges are processing significantly slower than inserts: while pushing to view signoz_metrics.time_series_v4_1week_mv_separate_attrs (02ab556f-418b-403d-a83e-3b355bdea614): while pushing to view signoz_metrics.time_series_v4_1day_mv_separate_attrs (6d90e244-ee4c-4d36-b32e-e4369448b7cb): while pushing to view signoz_metrics.time_series_v4_6hrs_mv_separate_attrs (47890153-17b3-4398-99c6-d7c81c625b16)","interval":"5.14795946s"}
{
"date_time": "1747655494.506347",
"thread_name": "TCPServerConnection ([#53])",
"thread_id": "854",
"level": "Error",
"query_id": "0c53f2cf-41e8-46bb-ae9b-1c0a8cd6651e",
"logger_name": "TCPHandler",
"message": "Code: 252. DB::Exception: Too many parts (10001 with average size of 12.94 KiB) in table 'signoz_metrics.time_series_v4_1week (1ef177a4-4a8d-40d9-a260-30f04ed5ada4)'. Merges are processing significantly slower than inserts: while pushing to view signoz_metrics.time_series_v4_1week_mv_separate_attrs (02ab556f-418b-403d-a83e-3b355bdea614): while pushing to view signoz_metrics.time_series_v4_1day_mv_separate_attrs (6d90e244-ee4c-4d36-b32e-e4369448b7cb): while pushing to view signoz_metrics.time_series_v4_6hrs_mv_separate_attrs (47890153-17b3-4398-99c6-d7c81c625b16). (TOO_MANY_PARTS), Stack trace (when copying this message, always include the lines below):\n\n0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c800f1b in /usr/bin/clickhouse\n1. DB::Exception::Exception<unsigned long&, ReadableSize, String>(int, FormatStringHelperImpl<std::type_identity<unsigned long&>::type, std::type_identity<ReadableSize>::type, std::type_identity<String>::type>, unsigned long&, ReadableSize&&, String&&) @ 0x00000000123b1e9a in /usr/bin/clickhouse\n2. DB::MergeTreeData::delayInsertOrThrowIfNeeded(Poco::Event*, std::shared_ptr<DB::Context const> const&, bool) const @ 0x00000000123b1acc in /usr/bin/clickhouse\n3. DB::runStep(std::function<void ()>, DB::ThreadStatus*, std::atomic<unsigned long>*) @ 0x0000000012bfe69c in /usr/bin/clickhouse\n4. DB::ExceptionKeepingTransform::work() @ 0x0000000012bfdc90 in /usr/bin/clickhouse\n5. DB::ExecutionThreadContext::executeTask() @ 0x000000001299371a in /usr/bin/clickhouse\n6. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x000000001298a170 in /usr/bin/clickhouse\n7. DB::PipelineExecutor::executeStep(std::atomic<bool>*) @ 0x0000000012989928 in /usr/bin/clickhouse\n8. DB::PushingPipelineExecutor::start() @ 0x000000001299b960 in /usr/bin/clickhouse\n9. DB::DistributedSink::writeToLocal(DB::Cluster::ShardInfo const&, DB::Block const&, unsigned long) @ 0x0000000012271b58 in /usr/bin/clickhouse\n10. DB::DistributedSink::writeAsyncImpl(DB::Block const&, unsigned long) @ 0x000000001226efd4 in /usr/bin/clickhouse\n11. DB::DistributedSink::consume(DB::Chunk) @ 0x000000001226b7da in /usr/bin/clickhouse\n12. DB::SinkToStorage::onConsume(DB::Chunk) @ 0x0000000012ccb7c2 in /usr/bin/clickhouse\n13. void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<DB::ExceptionKeepingTransform::work()::$_1, void ()>>(std::__function::__policy_storage const*) @ 0x0000000012bfe98b in /usr/bin/clickhouse\n14. DB::runStep(std::function<void ()>, DB::ThreadStatus*, std::atomic<unsigned long>*) @ 0x0000000012bfe69c in /usr/bin/clickhouse\n15. DB::ExceptionKeepingTransform::work() @ 0x0000000012bfdd73 in /usr/bin/clickhouse\n16. DB::ExecutionThreadContext::executeTask() @ 0x000000001299371a in /usr/bin/clickhouse\n17. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x000000001298a170 in /usr/bin/clickhouse\n18. DB::PipelineExecutor::executeStep(std::atomic<bool>*) @ 0x0000000012989928 in /usr/bin/clickhouse\n19. DB::TCPHandler::runImpl() @ 0x0000000012920b9e in /usr/bin/clickhouse\n20. DB::TCPHandler::run() @ 0x0000000012933eb9 in /usr/bin/clickhouse\n21. Poco::Net::TCPServerConnection::start() @ 0x00000000153a5a72 in /usr/bin/clickhouse\n22. Poco::Net::TCPServerDispatcher::run() @ 0x00000000153a6871 in /usr/bin/clickhouse\n23. Poco::PooledThread::run() @ 0x000000001549f047 in /usr/bin/clickhouse\n24. Poco::ThreadImpl::runnableEntry(void*) @ 0x000000001549d67d in /usr/bin/clickhouse\n25. ? @ 0x00007cdb8fde3609\n26. ? @ 0x00007cdb8fd08353\n",
"source_file": "src/Server/TCPHandler.cpp; void DB::TCPHandler::runImpl()",
"source_line": "686"
}
Please help me!
Any suggestion will be highly appreciated.
Thank you!Nagesh Bansal
05/19/2025, 6:01 PMNagesh Bansal
05/19/2025, 6:01 PMBobses
05/20/2025, 8:04 AMMatheus Henrique
05/22/2025, 6:42 PMBobses
05/23/2025, 6:17 AMbatch:
send_batch_size: 100000
timeout: 22s
The error persists, but I can see the nodes in the Infra Monitoring menu.
I assume this is a bug.