bcz of this our query-service pods in crash back l...
# support
s
bcz of this our query-service pods in crash back loop
s
That shouldn’t be the case. Can you share the output of describe pods for query-service?
s
Ok
I will share
n
can you also check if clickhouse is in a healthy state ? or it’s restarting as well ?
s
No click house is in healthy state no restart
I am out
Soon i will share the descrbe commnd out put
s
That will help to know the correct reason. The background upload fail to s3 wouldn’t crash the DB and (should be) unrelated to query service getting into crashbackloopoff.
s
describe pods output
n
I can see
OOMKilled
as the reason for termination, we will have to increase the limits. cc @Ankit Nayan @Srikanth Chekuri
s
@sudhanshu dev Can you share the output for this from clickhouse
select count() from signoz_metrics.time_series_v2;
so we can give some rough estimate of how much RAM is needed for query service to not crash?
s
Got it
@Srikanth Chekuri Here the query output
SELECT count() FROM signoz_metrics.time_series_v2 Query id: a268c555-d984-4e25-9fc6-86856e661876 ┌─count()─┐ │ 309883 │ └─────────┘ 1 rows in set. Elapsed: 0.006 sec.
Plz provide any idea for RAM
limit
a
@sudhanshu dev there is some inefficiency in loading timeseries right now. We should be fixing this within 3-4 weeks. Right now we are trying a temp fix and estimate.
s
ok
got it
a
Is it possible to not limit the query service in resources and run it for a few mins (15-30 mins should be good), we want to collect pprof data
and then we could provide a better fix sooner
s
ok
got it
a
also, I see you running in
0.11.0
, we have raised a fix to reduce memory usage of query-service in
v0.11.1
. Can you give it a try?
s
Sure
a
Let us know if the query-service does not run within 4GB of memory
s
will also do that
I removed the limits from query service statefulset
and now monitoring
a
thanks
@Srikanth Chekuri @Prashant Shahi can you share instructions to capture cpu and memory profiles when under high usage
s
Yes it would help us
To do capacity planning
p
@Ankit Nayan @sudhanshu dev Port-forward pprof port
6060
of query-service container:
Copy code
kubectl -n platform port-forward pod/my-release-signoz-query-service-0 6060:6060
In another terminal, run the following to obtain pprof data: • CPU Profile
Copy code
curl "<http://localhost:6060/debug/pprof/profile?seconds=30>" -o query-service.pprof -v
• Heap Profile
Copy code
curl "<http://localhost:6060/debug/pprof/heap>" -o query-service-heap.pprof -v
s
Got it