:wave::skin-tone-4: what am I doing wrong? I can see traces listed in the "key operations" view of t...
k
👋🏽 what am I doing wrong? I can see traces listed in the "key operations" view of the services tab, but clicking a trace or trying anything in the new trace explorer yields network errors. the dev console makes it look like certain requests aren't even tried. the old trace view worked, the other tabs work... did i miss a new setting somewhere?
v0.55.0, new clickhouse data dir, otel-collector is clearly receiving otlp and writing to clickhouse 🤔
s
@Kenichi Nakamura, in the dev tool, can you clear the site data and check again?
k
@Srikanth Chekuri thanks - yes, i cleared the data for this domain, and run with the dev tools "disable cache" box checked. i still see the same error 😞
s
Can you share the logs of query-service when you see this error?
k
sure. this is from a page refresh on
/traces-explorer
:
Untitled
s
Hmm this doesn't show anything. Can you share the response from n/w tab?
k
yes, one sec
here are caps of network tab and console
a number of successful requests happen above the capture area
s
For the failed request, can you share the response
k
No response data available for this request
image.png
s
I am asking for the failed query range request with 500 status.
k
i'm not sure, where do you see a 500 status? here is the
query_range
POST, which doesn't appear to have a status. it looks like it wasn't even attempted.
(btw, thank you for trying to help! 🙇🏽)
ahh i see in the console,
Error: API responded with 500
, but i don't see that request in the network tab anywhere 🤷🏽‍♂️
s
right, we can see from the console stack trace that the query range request is failing with 500. However, you mention no errors in query-service logs and no failed requests in the console tab either. Can you once again confirm if there are no error logs in the query service, if so, please share the clickhouse server logs.
k
i have nginx in front of the compose services that this is running. there only request in the nginx error log when i do a fresh load of
/traces-explorer
is this:
Copy code
2024/10/08 16:47:02 [error] 7428#0: *13258 connect() failed (111: Connection refused) while connecting to upstream, client: 97.120.109.238, server: <http://signoz.example.com|signoz.example.com>, request: "GET /src_hooks_queryBuilder_useGetExplorerQueryRange_ts.1c3db49acf4bfaebf30a.js.map HTTP/2.0", upstream: "http://[::1]:3301/src_hooks_queryBuilder_useGetExplorerQueryRange_ts.1c3db49acf4bfaebf30a.js.map", host: "<http://signoz.example.com|signoz.example.com>"
a number of successful requests appear in the regular access_log, that mirror the query-service logs i've pasted above. i will grab clickhouse logs now.
hmm, there are a number of errors in the clickhouse logs!
Copy code
{
  "date_time": "1728406269.895175",
  "thread_name": "TCPServerConnection ([#1])",
  "thread_id": "48",
  "level": "Error",
  "query_id": "",
  "logger_name": "ServerErrorHandler",
  "message": "Poco::Exception. Code: 1000, e.code() = 32, I/O error: Broken pipe, Stack trace (when copying this message, always include the lines below):\n\n0. Poco::Net::SocketImpl::error(int, String const&) @ 0x00000000153a1b5f in /usr/bin/clickhouse\n1. Poco::Net::SocketImpl::sendBytes(void const*, int, int) @ 0x00000000153a2bbd in /usr/bin/clickhouse\n2. Poco::Net::StreamSocketImpl::sendBytes(void const*, int, int) @ 0x00000000153a5296 in /usr/bin/clickhouse\n3. Poco::Net::HTTPSession::write(char const*, long) @ 0x00000000153908b3 in /usr/bin/clickhouse\n4. Poco::Net::HTTPHeaderIOS::~HTTPHeaderIOS() @ 0x000000001538bbdb in /usr/bin/clickhouse\n5. Poco::Net::HTTPHeaderOutputStream::~HTTPHeaderOutputStream() @ 0x000000001538bf1f in /usr/bin/clickhouse\n6. DB::HTTPServerResponse::send() @ 0x0000000012942988 in /usr/bin/clickhouse\n7. DB::HTTPServerConnection::sendErrorResponse(Poco::Net::HTTPServerSession&, Poco::Net::HTTPResponse::HTTPStatus) @ 0x000000001293ecda in /usr/bin/clickhouse\n8. DB::HTTPServerConnection::run() @ 0x000000001293e97b in /usr/bin/clickhouse\n9. Poco::Net::TCPServerConnection::start() @ 0x00000000153a5a72 in /usr/bin/clickhouse\n10. Poco::Net::TCPServerDispatcher::run() @ 0x00000000153a6871 in /usr/bin/clickhouse\n11. Poco::PooledThread::run() @ 0x000000001549f047 in /usr/bin/clickhouse\n12. Poco::ThreadImpl::runnableEntry(void*) @ 0x000000001549d67d in /usr/bin/clickhouse\n13. ? @ 0x00007f76e3bad609\n14. ? @ 0x00007f76e3ad2353\n (version 24.1.2.5 (official build))",
  "source_file": "src/Common/Exception.cpp; void DB::tryLogCurrentExceptionImpl(Poco::Logger *, const std::string &)",
  "source_line": "222"
}
let me restart it and and see how it boots...
clickhouse restarted ok, and recreated its data directory. the app i monitor has been sending traces. once the full signoz came back up, i was able to see just a couple in the "key operations" box of the services tab. however, visiting
/traces-explorer
again results in the above problem. the
Poco::Exception
seems to happen every minute. there are no other messages in the compose logs other than clickhouse startup.
Copy code
$ docker compose logs -f clickhouse
signoz-clickhouse  | Processing configuration file '/etc/clickhouse-server/config.xml'.                                                                                                                 signoz-clickhouse  | Merging configuration file '/etc/clickhouse-server/config.d/cluster.xml'.                                                                                                          signoz-clickhouse  | Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
signoz-clickhouse  | Cannot set max size of core file to 1073741824                                                                                                                                     signoz-clickhouse  | Logging information to /var/log/clickhouse-server/clickhouse-server.log                                                                                                            signoz-clickhouse  | Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
signoz-clickhouse  | {"date_time":"1728406519.169952","thread_name":"","thread_id":"1","level":"Information","query_id":"","logger_name":"SentryWriter","message":"Sending crash reports is disabled","source_file":"src\/Daemon\/SentryWriter.cpp; void SentryWriter::initialize(Poco::Util::LayeredConfiguration &)","source_line":"131"}                                                                     signoz-clickhouse  | {"date_time":"1728406519.321808","thread_name":"","thread_id":"1","level":"Information","query_id":"","logger_name":"Application","message":"Starting ClickHouse 24.1.2.5 (revision
: 54482, git hash: b2605dd4a5a30131444dba7e6149a1412e83b8eb, build id: 10E44DD06215CD8F54CEB01CC942EE9BAC9B41E1), PID 1","source_file":"","source_line":"0"}                                            signoz-clickhouse  | {"date_time":"1728406519.322189","thread_name":"","thread_id":"1","level":"Information","query_id":"","logger_name":"Application","message":"starting up","source_file":"","source_line":"0"}
signoz-clickhouse  | {"date_time":"1728406519.322264","thread_name":"","thread_id":"1","level":"Information","query_id":"","logger_name":"Application","message":"OS name: Linux, version: 6.2.9-x86_64-linode160, architecture: x86_64","source_file":"programs\/server\/Server.cpp; virtual void DB::Server::initialize(Poco::Util::Application &)","source_line":"425"}
...
i see the same thing in
/var/log/clickhouse-server/*
files inside the container.
s
The Poco exceptions are not relevant. take a failed request, try the curl version of it, and share what message the error response contains
k
🙇🏽‍♂️ Thanks again for the pointers! After trying the curl version of the request and seeing the server cut it off mid h2 stream, I tried a few experiments. The
Referer
header value was quite large, and removing it made the request go through. I have another
nginx
that reverse proxies to the
signoz-frontend
service. The fix was to add these lines to the nginx config of the reverse proxy
server {}
block.
Copy code
client_max_body_size 24M;
    large_client_header_buffers 8 128k;
I had not run into this issue until the new trace explorer, good to know!