There are no Alert Rules, no Dashboards, and I've ...
# support
g
There are no Alert Rules, no Dashboards, and I've closed the UI. But ClickHouse still seems to be very very busy running some queries.. Is this normal?
s
No, this isn’t normal. This is another instance of High memory consumption because of mat column failure we have come across. There was a bug in creating mat column, but I believe that was fixed. Did you use the SigNoz logs? Or did it start without your interaction?
g
I didnt explicitly start it. I used all the defaults from the helm installation tutorial. Only difference is I used namespace
signoz
instead of
platform
. I did open the logs tab in the web UI.
I hope it doesnt have the namespace hardcoded to ignore for log simport? Otherwise it could be in an infinite loop; loging an error when it imports a log > logging an error when it imports that log > logging an error when it imports that log > etc.
What would be the best steps for me to either fix this or reproduce this for a bug report?
s
What steps did you follow that triggered this error? I don’t think that the namespace will create this issue.
g
Just the steps I mentioned above. Installed using the steps on https://signoz.io/docs/install/kubernetes/others/
Including the sample application for generating load
Then browsed to :3301 using kubectl portforward
Only override I did was storageClass
s
Does it reproduce if you repeat the same steps again?
g
Will check, doing them now 🙂
I have a hunch it has to do with live view...
In combination with k8s_pod_name
When I select that attribute in the logs UI, it appears twice.
When I then add some others and remove one of the two
k8s_pod_name
fields from SELECTED FIELDS, both k8s_pod_name entries are removed at the same time, and errors start to happen..
This seems to lead to some infinite recursion, as clickhouse logs appear in the 'live view', but they also cause errors by appearing in the live view, which are then appearing in the live view..... basically the same loop I mentioned above.. But it seems that having the live view active is key..
At the same time, disabling live view in the UI doesn't fix it, as if it's stuck in the backend, still running that query..
But I haven't been able to trigger it without using live view so I think that should be a key suspect to look at..
It's actually possihble without live view.
Let me write a better reproduction report: • In the web ui go to logs • Add selected field "k8s_pod_name", which will appear twice in the "SELECTED FIELDS" list. • Add
os_type
to the SELECTED FIELDS • All is still fine, until the next step: • Remove k8s_pod_name from the SELECTED FIELDS. Both entries will disappear at the same time (even though clicking the once for only one of the entries) • clickhouse now becomes very busy; consuming 60% of my machines capacity, 4000-5000 mCPU in k8s stats, using 750MiB RAM. • An error mentioning a field that was NOT removed will spam the logs of clickhouse:
MutatePlainMergeTreeTask: Code: 10. DB:Exception Not found column
os_type
in block.
• I did not remove
os_type
, but it was one of the remaining fields in SELECTED FIELDS in the web UI. • When I actually remove
os_type
from the SELECTED FIELDS list, clickhouse immediately calms down. Within seconds it's back to normal CPU usage. Memory stays at 750MiB, I guess it's not freed.. (but thats not really a problem for me).. • So I guess there is a problem with that
k8s_pod_name
appearing twice, and then being removed causes this issue? The API call to
api/v1/logs/fields
seems to cause the change in clickhouse behavior..
Hope this helps!
I'm running v0.14.0 by the way, through helm chart v0.9.1
clickhouse-server:22.8.8-alpine
frontend and query-service have docker tag 0.14.0
s
Thanks, that helps