I am seeing this log repeating frequently in my cl...
# support
k
I am seeing this log repeating frequently in my click house instance. Any idea how to resolve it? My metrics are lagging.
Copy code
2024.04.01 18:20:03.760284 [ 629 ] {} <Error> MergeTreeBackgroundExecutor: Exception while executing background task {3027d9e2-7d6b-4983-8164-e7da8d47f6c5::20240229_11958_12026_2}: Code: 117. DB::Exception: Invalid flag for negative store: (while reading from part /var/lib/clickhouse/store/302/3027d9e2-7d6b-4983-8164-e7da8d47f6c5/20240229_11976_11981_1/ in table signoz_metrics.exp_hist (3027d9e2-7d6b-4983-8164-e7da8d47f6c5) located on disk default of type local, from mark 0 with max_rows_to_read = 217): While executing MergeTreeSequentialSource. (INCORRECT_DATA), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c800f1b in /usr/bin/clickhouse
1. DB::Exception::Exception<>(int, FormatStringHelperImpl<>) @ 0x000000000721a243 in /usr/bin/clickhouse
2. DB::DDSketchDenseLogarithmic::deserialize(DB::ReadBuffer&) @ 0x000000000d5260e0 in /usr/bin/clickhouse
3. DB::SerializationAggregateFunction::deserializeBinaryBulk(DB::IColumn&, DB::ReadBuffer&, unsigned long, double) const @ 0x00000000108b58e9 in /usr/bin/clickhouse
4. DB::ISerialization::deserializeBinaryBulkWithMultipleStreams(COW<DB::IColumn>::immutable_ptr<DB::IColumn>&, unsigned long, DB::ISerialization::DeserializeBinaryBulkSettings&, std::shared_ptr<DB::ISerialization::DeserializeBinaryBulkState>&, std::unordered_map<String, COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::hash<String>, std::equal_to<String>, std::allocator<std::pair<String const, COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>>*) const @ 0x00000000108b01d9 in /usr/bin/clickhouse
5. DB::MergeTreeReaderCompact::readRows(unsigned long, unsigned long, bool, unsigned long, std::vector<COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>&) @ 0x000000001250a6b9 in /usr/bin/clickhouse
6. DB::MergeTreeSequentialSource::generate() @ 0x000000001251ca4d in /usr/bin/clickhouse
7. DB::ISource::tryGenerate() @ 0x000000001297acf5 in /usr/bin/clickhouse
8. DB::ISource::work() @ 0x000000001297a743 in /usr/bin/clickhouse
9. DB::ExecutionThreadContext::executeTask() @ 0x000000001299371a in /usr/bin/clickhouse
10. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x000000001298a170 in /usr/bin/clickhouse
11. DB::PipelineExecutor::executeStep(std::atomic<bool>*) @ 0x0000000012989928 in /usr/bin/clickhouse
12. DB::PullingPipelineExecutor::pull(DB::Chunk&) @ 0x0000000012998017 in /usr/bin/clickhouse
13. DB::PullingPipelineExecutor::pull(DB::Block&) @ 0x00000000129981d3 in /usr/bin/clickhouse
14. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::executeImpl() @ 0x000000001233b6f2 in /usr/bin/clickhouse
15. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::execute() @ 0x000000001233b64b in /usr/bin/clickhouse
16. DB::MergeTask::execute() @ 0x0000000012340d99 in /usr/bin/clickhouse
17. DB::MergePlainMergeTreeTask::executeStep() @ 0x0000000012723517 in /usr/bin/clickhouse
18. DB::MergeTreeBackgroundExecutor<DB::DynamicRuntimeQueue>::threadFunction() @ 0x00000000123532c4 in /usr/bin/clickhouse
19. ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::worker(std::__list_iterator<ThreadFromGlobalPoolImpl<false>, void*>) @ 0x000000000c8eb0c1 in /usr/bin/clickhouse
20. void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<void ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>(void&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x000000000c8ee8fa in /usr/bin/clickhouse
21. void* std::__thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void ThreadPoolImpl<std::thread>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>>(void*) @ 0x000000000c8ed6fe in /usr/bin/clickhouse
22. ? @ 0x000071d276b3b609
23. ? @ 0x000071d276a60353
 (version 24.1.2.5 (official build))
s
How often do you see this? This indicates a data corruption related issue. And when you say "metrics are lagging" what is lagging specifically?
k
I see the log about 10-15 times a second. I finally just recreated the clickhouse cluster and it has been fine for about 9 days and then the error resurfaces:
Copy code
2024.04.14 16:18:45.258776 [ 620 ] {} <Error> MergeTreeBackgroundExecutor: Exception while executing background task {cbd579c4-55b0-4174-aee5-ba1b20c91d3e::20240407_18548_18588_2}: Code: 117. DB::Exception: Invalid flag for negative store: (while reading from part /var/lib/clickhouse/store/cbd/cbd579c4-55b0-4174-aee5-ba1b20c91d3e/20240407_18548_18573_1/ in table signoz_metrics.exp_hist (cbd579c4-55b0-4174-aee5-ba1b20c91d3e) located on disk default of type local, from mark 0 with max_rows_to_read = 1161): While executing MergeTreeSequentialSource. (INCORRECT_DATA), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c800f1b in /usr/bin/clickhouse
1. DB::Exception::Exception<>(int, FormatStringHelperImpl<>) @ 0x000000000721a243 in /usr/bin/clickhouse
2. DB::DDSketchDenseLogarithmic::deserialize(DB::ReadBuffer&) @ 0x000000000d5260e0 in /usr/bin/clickhouse
3. DB::SerializationAggregateFunction::deserializeBinaryBulk(DB::IColumn&, DB::ReadBuffer&, unsigned long, double) const @ 0x00000000108b58e9 in /usr/bin/clickhouse
4. DB::ISerialization::deserializeBinaryBulkWithMultipleStreams(COW<DB::IColumn>::immutable_ptr<DB::IColumn>&, unsigned long, DB::ISerialization::DeserializeBinaryBulkSettings&, std::shared_ptr<DB::ISerialization::DeserializeBinaryBulkState>&, std::unordered_map<String, COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::hash<String>, std::equal_to<String>, std::allocator<std::pair<String const, COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>>*) const @ 0x00000000108b01d9 in /usr/bin/clickhouse
5. DB::MergeTreeReaderCompact::readRows(unsigned long, unsigned long, bool, unsigned long, std::vector<COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>&) @ 0x000000001250a6b9 in /usr/bin/clickhouse
6. DB::MergeTreeSequentialSource::generate() @ 0x000000001251ca4d in /usr/bin/clickhouse
7. DB::ISource::tryGenerate() @ 0x000000001297acf5 in /usr/bin/clickhouse
8. DB::ISource::work() @ 0x000000001297a743 in /usr/bin/clickhouse
9. DB::ExecutionThreadContext::executeTask() @ 0x000000001299371a in /usr/bin/clickhouse
10. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x000000001298a170 in /usr/bin/clickhouse
11. DB::PipelineExecutor::executeStep(std::atomic<bool>*) @ 0x0000000012989928 in /usr/bin/clickhouse
12. DB::PullingPipelineExecutor::pull(DB::Chunk&) @ 0x0000000012998017 in /usr/bin/clickhouse
13. DB::PullingPipelineExecutor::pull(DB::Block&) @ 0x00000000129981d3 in /usr/bin/clickhouse
14. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::executeImpl() @ 0x000000001233b6f2 in /usr/bin/clickhouse
15. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::execute() @ 0x000000001233b64b in /usr/bin/clickhouse
16. DB::MergeTask::execute() @ 0x0000000012340d99 in /usr/bin/clickhouse
17. DB::MergePlainMergeTreeTask::executeStep() @ 0x0000000012723517 in /usr/bin/clickhouse
18. DB::MergeTreeBackgroundExecutor<DB::DynamicRuntimeQueue>::threadFunction() @ 0x00000000123532c4 in /usr/bin/clickhouse
19. ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::worker(std::__list_iterator<ThreadFromGlobalPoolImpl<false>, void*>) @ 0x000000000c8eb0c1 in /usr/bin/clickhouse
20. void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<void ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>(void&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x000000000c8ee8fa in /usr/bin/clickhouse
21. void* std::__thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void ThreadPoolImpl<std::thread>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>>(void*) @ 0x000000000c8ed6fe in /usr/bin/clickhouse
22. ? @ 0x00007f7ffb586609
23. ? @ 0x00007f7ffb4ab353
 (version 24.1.2.5 (official build))
I am new to clickhouse and don't see much on how to deal with data corruption like this? Any suggestions would be great.
Also among this log is this error:
Copy code
2024.04.14 16:25:02.585614 [ 182659 ] {335f2033-175e-4296-8888-382020da5920} <Error> TCPHandler: Code: 252. DB::Exception: Too many parts (3000 with average size of 4.56 KiB) in table 'signoz_metrics.exp_hist (cbd579c4-55b0-4174-aee5-ba1b20c91d3e)'. Merges are processing significantly slower than inserts. (TOO_MANY_PARTS), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c800f1b in /usr/bin/clickhouse
1. DB::Exception::Exception<unsigned long&, ReadableSize, String>(int, FormatStringHelperImpl<std::type_identity<unsigned long&>::type, std::type_identity<ReadableSize>::type, std::type_identity<String>::type>, unsigned long&, ReadableSize&&, String&&) @ 0x00000000123b1e9a in /usr/bin/clickhouse
2. DB::MergeTreeData::delayInsertOrThrowIfNeeded(Poco::Event*, std::shared_ptr<DB::Context const> const&, bool) const @ 0x00000000123b1acc in /usr/bin/clickhouse
3. DB::runStep(std::function<void ()>, DB::ThreadStatus*, std::atomic<unsigned long>*) @ 0x0000000012bfe69c in /usr/bin/clickhouse
4. DB::ExceptionKeepingTransform::work() @ 0x0000000012bfdc90 in /usr/bin/clickhouse
5. DB::ExecutionThreadContext::executeTask() @ 0x000000001299371a in /usr/bin/clickhouse
6. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x000000001298a170 in /usr/bin/clickhouse
7. DB::PipelineExecutor::executeStep(std::atomic<bool>*) @ 0x0000000012989928 in /usr/bin/clickhouse
8. DB::PushingPipelineExecutor::start() @ 0x000000001299b960 in /usr/bin/clickhouse
9. DB::DistributedSink::writeToLocal(DB::Cluster::ShardInfo const&, DB::Block const&, unsigned long) @ 0x0000000012271b58 in /usr/bin/clickhouse
10. DB::DistributedSink::writeAsyncImpl(DB::Block const&, unsigned long) @ 0x000000001226efd4 in /usr/bin/clickhouse
11. DB::DistributedSink::consume(DB::Chunk) @ 0x000000001226e21b in /usr/bin/clickhouse
12. DB::SinkToStorage::onConsume(DB::Chunk) @ 0x0000000012ccb7c2 in /usr/bin/clickhouse
13. void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<DB::ExceptionKeepingTransform::work()::$_1, void ()>>(std::__function::__policy_storage const*) @ 0x0000000012bfe98b in /usr/bin/clickhouse
14. DB::runStep(std::function<void ()>, DB::ThreadStatus*, std::atomic<unsigned long>*) @ 0x0000000012bfe69c in /usr/bin/clickhouse
15. DB::ExceptionKeepingTransform::work() @ 0x0000000012bfdd73 in /usr/bin/clickhouse
16. DB::ExecutionThreadContext::executeTask() @ 0x000000001299371a in /usr/bin/clickhouse
17. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x000000001298a170 in /usr/bin/clickhouse
18. DB::PipelineExecutor::executeStep(std::atomic<bool>*) @ 0x0000000012989928 in /usr/bin/clickhouse
19. DB::TCPHandler::runImpl() @ 0x0000000012920b9e in /usr/bin/clickhouse
20. DB::TCPHandler::run() @ 0x0000000012933eb9 in /usr/bin/clickhouse
21. Poco::Net::TCPServerConnection::start() @ 0x00000000153a5a72 in /usr/bin/clickhouse
22. Poco::Net::TCPServerDispatcher::run() @ 0x00000000153a6871 in /usr/bin/clickhouse
23. Poco::PooledThread::run() @ 0x000000001549f047 in /usr/bin/clickhouse
24. Poco::ThreadImpl::runnableEntry(void*) @ 0x000000001549d67d in /usr/bin/clickhouse
25. ? @ 0x00007f7ffb586609
26. ? @ 0x00007f7ffb4ab353
I have increased the otel batch size to 100,000 with a timeout of 10s but it doesn't seem to help.
When I talk about the delay in metrics, I mean to say the past metrics eventually get added to the database just 10 minutes after the measurement.
Could I truncate the exp_hist table?
s
You can do that if you are not using the exponential histograms from OTEL. It would be helpful if you could share how to reproduce this.
k
Yeah, I wish I understood the conditions that led to this. Unfortunately I don't know how to reproduce it.