Daniel Hilgarth
02/21/2025, 8:37 AMDaniel Hilgarth
02/21/2025, 8:37 AM<?xml version="1.0"?>
<clickhouse>
<storage_configuration>
<disks>
<default>
<keep_free_space_bytes>17825792</keep_free_space_bytes>
</default>
<s3>
<type>s3</type>
<endpoint><https://fsn1.your-objectstorage.com/signoz-long-term-storage-prod//signoz></endpoint>
<access_key_id>XXX</access_key_id>
<secret_access_key>XXX</secret_access_key>
</s3>
</disks>
<policies>
<tiered>
<volumes>
<default>
<disk>default</disk>
</default>
<s3>
<disk>s3</disk>
<perform_ttl_move_on_insert>0</perform_ttl_move_on_insert>
</s3>
</volumes>
</tiered>
</policies>
</storage_configuration>
</clickhouse>
I'm on Signoz v0.56.0 with Clickhouse 24.1.2-alpine. I can't read the retention settings in Signoz because the page doesn't finish loading, probably because Clickhouse is down.
I get hundreds of the following message in the log until the clickhouse container is killed because of being unhealthy:
2025.02.20 23:34:00.251300 [ 701 ] {} <Information> AWSClient: Failed to make request to: <https://fsn1.your-objectstorage.com/signoz-long-term-storage-prod/signoz/xoy/xdigpjscddongysadrydkdnircogd>: Poco::Exception. Code: 1000, e.code() = 0, Timeout, Stack trace (when copying this message, always include the lines below):
0. Poco::Net::SecureSocketImpl::mustRetry(int, Poco::Timespan&) @ 0x000000001537d5cc in /usr/bin/clickhouse
1. Poco::Net::SecureSocketImpl::receiveBytes(void*, int, int) @ 0x000000001537e8b4 in /usr/bin/clickhouse
2. Poco::Net::HTTPSession::refill() @ 0x000000001539078f in /usr/bin/clickhouse
3. Poco::Net::HTTPHeaderStreamBuf::readFromDevice(char*, long) @ 0x000000001538ba20 in /usr/bin/clickhouse
4. Poco::BasicBufferedStreamBuf<char, std::char_traits<char>, Poco::BufferAllocator<char>>::underflow() @ 0x00000000152a9f68 in /usr/bin/clickhouse
5. std::basic_streambuf<char, std::char_traits<char>>::uflow() @ 0x000000000720fd4a in /usr/bin/clickhouse
6. std::basic_istream<char, std::char_traits<char>>::get() @ 0x0000000007210a39 in /usr/bin/clickhouse
7. Poco::Net::HTTPResponse::read(std::basic_istream<char, std::char_traits<char>>&) @ 0x000000001538ee6f in /usr/bin/clickhouse
8. Poco::Net::HTTPClientSession::receiveResponse(Poco::Net::HTTPResponse&) @ 0x0000000015384fda in /usr/bin/clickhouse
9. void DB::S3::PocoHTTPClient::makeRequestInternalImpl<true>(Aws::Http::HttpRequest&, DB::ProxyConfiguration const&, std::shared_ptr<DB::S3::PocoHTTPResponse>&, Aws::Utils::RateLimits::RateLimiterInterface*, Aws::Utils::RateLimits::RateLimiterInterface*) const @ 0x0000000010144db6 in /usr/bin/clickhouse
10. DB::S3::PocoHTTPClient::MakeRequest(std::shared_ptr<Aws::Http::HttpRequest> const&, Aws::Utils::RateLimits::RateLimiterInterface*, Aws::Utils::RateLimits::RateLimiterInterface*) const @ 0x000000001014129c in /usr/bin/clickhouse
11. Aws::Client::AWSClient::AttemptExhaustively(Aws::Http::URI const&, Aws::AmazonWebServiceRequest const&, Aws::Http::HttpMethod, char const*, char const*, char const*) const @ 0x00000000154f80b4 in /usr/bin/clickhouse
12. Aws::Client::AWSClient::MakeRequestWithUnparsedResponse(Aws::Http::URI const&, Aws::AmazonWebServiceRequest const&, Aws::Http::HttpMethod, char const*, char const*, char const*) const @ 0x00000000154ff706 in /usr/bin/clickhouse
13. Aws::S3::S3Client::GetObject(Aws::S3::Model::GetObjectRequest const&) const @ 0x00000000155a58c7 in /usr/bin/clickhouse
14. DB::S3::Client::GetObject(DB::S3::ExtendedRequest<Aws::S3::Model::GetObjectRequest>&) const @ 0x0000000010112c80 in /usr/bin/clickhouse
15. DB::ReadBufferFromS3::sendRequest(unsigned long, unsigned long, std::optional<unsigned long>) const @ 0x0000000010179f63 in /usr/bin/clickhouse
16. DB::ReadBufferFromS3::nextImpl() @ 0x0000000010177072 in /usr/bin/clickhouse
17. DB::ReadBufferFromRemoteFSGather::nextImpl() @ 0x00000000101e5961 in /usr/bin/clickhouse
18. DB::ThreadPoolRemoteFSReader::execute(DB::IAsynchronousReader::Request, bool) @ 0x00000000100629af in /usr/bin/clickhouse
19. DB::ThreadPoolRemoteFSReader::execute(DB::IAsynchronousReader::Request) @ 0x00000000100633a4 in /usr/bin/clickhouse
20. DB::AsynchronousBoundedReadBuffer::nextImpl() @ 0x00000000101f82ae in /usr/bin/clickhouse
21. void DB::readIntTextImpl<int, void, (DB::ReadIntTextCheckOverflow)0>(int&, DB::ReadBuffer&) @ 0x00000000073fdd36 in /usr/bin/clickhouse
22. DB::IMergeTreeDataPart::loadColumnsChecksumsIndexes(bool, bool) @ 0x00000000122d2190 in /usr/bin/clickhouse
23. DB::IMergeTreeDataPart::loadProjections(bool, bool, bool) @ 0x00000000122d8f8a in /usr/bin/clickhouse
24. DB::IMergeTreeDataPart::loadColumnsChecksumsIndexes(bool, bool) @ 0x00000000122d582d in /usr/bin/clickhouse
25. DB::MergeTreeData::loadDataPart(DB::MergeTreePartInfo const&, String const&, std::shared_ptr<DB::IDisk> const&, DB::MergeTreeDataPartState, std::mutex&) @ 0x0000000012366ca0 in /usr/bin/clickhouse
26. DB::MergeTreeData::loadDataPartWithRetries(DB::MergeTreePartInfo const&, String const&, std::shared_ptr<DB::IDisk> const&, DB::MergeTreeDataPartState, std::mutex&, unsigned long, unsigned long, unsigned long) @ 0x000000001236ca42 in /usr/bin/clickhouse
27. void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<DB::MergeTreeData::loadDataPartsFromDisk(std::vector<std::shared_ptr<DB::MergeTreeData::PartLoadingTree::Node>, std::allocator<std::shared_ptr<DB::MergeTreeData::PartLoadingTree::Node>>>&)::$_0, void ()>>(std::__function::__policy_storage const*) @ 0x0000000012404078 in /usr/bin/clickhouse
28. std::__packaged_task_func<std::function<std::future<void> (std::function<void ()>&&, Priority)> DB::threadPoolCallbackRunner<void, std::function<void ()>>(ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>&, String const&)::'lambda'(std::function<void ()>&&, Priority)::operator()(std::function<void ()>&&, Priority)::'lambda'(), std::allocator<std::function<std::future<void> (std::function<void ()>&&, Priority)> DB::threadPoolCallbackRunner<void, std::function<void ()>>(ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>&, String const&)::'lambda'(std::function<void ()>&&, Priority)::operator()(std::function<void ()>&&, Priority)::'lambda'()>, void ()>::operator()() @ 0x00000000104c519c in /usr/bin/clickhouse
29. std::packaged_task<void ()>::operator()() @ 0x000000000fcc4094 in /usr/bin/clickhouse
30. ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::worker(std::__list_iterator<ThreadFromGlobalPoolImpl<false>, void*>) @ 0x000000000c8eb0c1 in /usr/bin/clickhouse
31. void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<void ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>(void&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x000000000c8ee8fa in /usr/bin/clickhouse
(version 24.1.2.5 (official build))
2025.02.20 23:34:00.251408 [ 701 ] {} <Information> AWSClient: AWSXmlClient: HTTP response code: -1
Resolved remote host IP address: 88.198.120.64:443
Request ID:
Exception name:
Error message: Poco::Exception. Code: 1000, e.code() = 0, Timeout (version 24.1.2.5 (official build))
0 response headers:
2025.02.20 23:34:00.251432 [ 701 ] {} <Information> AWSClient: If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2025.02.20 23:34:00.251452 [ 701 ] {} <Information> AWSClient: Request failed, now waiting 0 ms before attempting again.
The S3 settings are unchanged and the same as before the restart. I've verified that my server has no time skew.
I've found this issue but can't really make sense of it, as well as this PR which is supposed to fix it but had adverse side-effects and was rolled back.
At this time, I don't know what else I can do to bring our production monitoring back up again.Srikanth Chekuri
02/21/2025, 8:49 AMDaniel Hilgarth
02/21/2025, 8:49 AMSrikanth Chekuri
02/21/2025, 8:50 AMDaniel Hilgarth
02/21/2025, 8:50 AMSrikanth Chekuri
02/21/2025, 8:52 AMDaniel Hilgarth
02/21/2025, 8:52 AMSrikanth Chekuri
02/21/2025, 8:54 AMDaniel Hilgarth
02/21/2025, 8:54 AMDaniel Hilgarth
02/21/2025, 8:55 AMDaniel Hilgarth
02/21/2025, 8:55 AMDaniel Hilgarth
02/21/2025, 8:56 AM"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 0,
"ExitCode": 137,
"Error": "",
"StartedAt": "2025-02-20T23:31:57.982243319Z",
"FinishedAt": "2025-02-20T23:34:00.364942126Z",
"Health": {
"Status": "unhealthy",
"FailingStreak": 4,
"Log": [
{
"Start": "2025-02-20T23:32:27.983378339Z",
"End": "2025-02-20T23:32:28.05422715Z",
"ExitCode": 1,
"Output": "wget: can't connect to remote host (127.0.0.1): Connection refused\n"
},
{
"Start": "2025-02-20T23:32:58.062312742Z",
"End": "2025-02-20T23:32:58.14132385Z",
"ExitCode": 1,
"Output": "wget: can't connect to remote host (127.0.0.1): Connection refused\n"
},
{
"Start": "2025-02-20T23:33:28.147397297Z",
"End": "2025-02-20T23:33:28.23582005Z",
"ExitCode": 1,
"Output": "wget: can't connect to remote host (127.0.0.1): Connection refused\n"
},
{
"Start": "2025-02-20T23:33:58.240654968Z",
"End": "2025-02-20T23:33:58.306481089Z",
"ExitCode": 1,
"Output": "wget: can't connect to remote host (127.0.0.1): Connection refused\n"
}
]
}
},
So clickhouse never finishes its startup routineDaniel Hilgarth
02/21/2025, 8:57 AM"Healthcheck": {
"Test": [
"CMD-SHELL",
"wget -q <http://localhost:8123/ping> -O /tmp/ping_response && grep \"Ok\\.\" /tmp/ping_response && rm /tmp/ping_response"
],
"Interval": 30000000000,
"Timeout": 5000000000,
"Retries": 3
},
Srikanth Chekuri
02/21/2025, 8:58 AMDaniel Hilgarth
02/21/2025, 8:59 AMDaniel Hilgarth
02/21/2025, 9:00 AMDaniel Hilgarth
02/21/2025, 9:00 AMDaniel Hilgarth
02/21/2025, 9:01 AMSrikanth Chekuri
02/21/2025, 9:02 AMDaniel Hilgarth
02/21/2025, 9:02 AMDaniel Hilgarth
02/21/2025, 10:12 AMDaniel Hilgarth
02/22/2025, 1:59 AMSrikanth Chekuri
02/22/2025, 4:03 AMDaniel Hilgarth
03/09/2025, 8:13 PM