https://signoz.io logo
s

sudhanshu dev

09/20/2022, 11:14 AM
bcz of this our query-service pods in crash back loop
s

Srikanth Chekuri

09/20/2022, 11:34 AM
That shouldn’t be the case. Can you share the output of describe pods for query-service?
s

sudhanshu dev

09/20/2022, 11:34 AM
Ok
I will share
n

nitya-signoz

09/20/2022, 11:54 AM
can you also check if clickhouse is in a healthy state ? or it’s restarting as well ?
s

sudhanshu dev

09/20/2022, 11:54 AM
No click house is in healthy state no restart
I am out
Soon i will share the descrbe commnd out put
s

Srikanth Chekuri

09/20/2022, 12:02 PM
That will help to know the correct reason. The background upload fail to s3 wouldn’t crash the DB and (should be) unrelated to query service getting into crashbackloopoff.
s

sudhanshu dev

09/20/2022, 2:28 PM
describe pods output
n

nitya-signoz

09/20/2022, 3:47 PM
I can see
OOMKilled
as the reason for termination, we will have to increase the limits. cc @Ankit Nayan @Srikanth Chekuri
s

Srikanth Chekuri

09/20/2022, 4:59 PM
@sudhanshu dev Can you share the output for this from clickhouse
select count() from signoz_metrics.time_series_v2;
so we can give some rough estimate of how much RAM is needed for query service to not crash?
s

sudhanshu dev

09/21/2022, 4:30 AM
Got it
@Srikanth Chekuri Here the query output
SELECT count() FROM signoz_metrics.time_series_v2 Query id: a268c555-d984-4e25-9fc6-86856e661876 ┌─count()─┐ │ 309883 │ └─────────┘ 1 rows in set. Elapsed: 0.006 sec.
Plz provide any idea for RAM
limit
a

Ankit Nayan

09/21/2022, 5:06 AM
@sudhanshu dev there is some inefficiency in loading timeseries right now. We should be fixing this within 3-4 weeks. Right now we are trying a temp fix and estimate.
s

sudhanshu dev

09/21/2022, 5:07 AM
ok
got it
a

Ankit Nayan

09/21/2022, 5:07 AM
Is it possible to not limit the query service in resources and run it for a few mins (15-30 mins should be good), we want to collect pprof data
and then we could provide a better fix sooner
s

sudhanshu dev

09/21/2022, 5:07 AM
ok
got it
a

Ankit Nayan

09/21/2022, 5:10 AM
also, I see you running in
0.11.0
, we have raised a fix to reduce memory usage of query-service in
v0.11.1
. Can you give it a try?
s

sudhanshu dev

09/21/2022, 5:10 AM
Sure
a

Ankit Nayan

09/21/2022, 5:10 AM
Let us know if the query-service does not run within 4GB of memory
s

sudhanshu dev

09/21/2022, 5:10 AM
will also do that
I removed the limits from query service statefulset
and now monitoring
a

Ankit Nayan

09/21/2022, 5:18 AM
thanks
@Srikanth Chekuri @Prashant Shahi can you share instructions to capture cpu and memory profiles when under high usage
s

sudhanshu dev

09/21/2022, 5:20 AM
Yes it would help us
To do capacity planning
p

Prashant Shahi

09/21/2022, 5:25 AM
@Ankit Nayan @sudhanshu dev Port-forward pprof port
6060
of query-service container:
Copy code
kubectl -n platform port-forward pod/my-release-signoz-query-service-0 6060:6060
In another terminal, run the following to obtain pprof data: • CPU Profile
Copy code
curl "<http://localhost:6060/debug/pprof/profile?seconds=30>" -o query-service.pprof -v
• Heap Profile
Copy code
curl "<http://localhost:6060/debug/pprof/heap>" -o query-service-heap.pprof -v
s

sudhanshu dev

09/21/2022, 5:26 AM
Got it
54 Views