This message was deleted.
# support
s
This message was deleted.
a
@Nick Burrett query-service IMO has nothing to do with migration.
What is the query-service fetching from the DB and caching in memory during the initialisation phase?
time-series probably cc: @Srikanth Chekuri. How many time-series do you have? Can you
exec -it
into your clickhouse container and run
Copy code
clickhouse client
Copy code
select count() from signoz_metrics.time_series_v2;
we have an alernate clickhouse way to run matchers for which timeseries are loaded in-memory during initialization. It can be initialised using an env var IMO. But it has not been tested yet. We can check that out if number of timeseries is huge
@Prashant Shahi do we have a memory profiler in query-service which @Nick Burrett can use and send us a dump. It would be best to act on that
p
@Nick Burrett To obtain pprof data from query-service, follow the steps below. Port forward
6060
from
query-service
pod:
Copy code
kubectl port-forward -n platform pod/my-release-signoz-query-service-0 6060
In another terminal, run the following to obtain pprof data: • CPU Profile
Copy code
curl "<http://localhost:6060/debug/pprof/profile?seconds=30>" -o query-service.pprof -v
• Heap Profile
Copy code
curl "<http://localhost:6060/debug/pprof/heap>" -o query-service-heap.pprof -v
After that share the obtained pprof file
query-service.pprof
in this thread.
n
count() from time_series_v2
gives a value of
404087
. Heap profile attached
s
I'm trying to figure out what memory sizing I really need to use for the query-service and whether that's going to grow significantly if I push more data into Clickhouse
So it's not the amount of data you push to ClickHouse that affects here. If the data you are pushing has new unique time series then it grows. It's not usually the case we have seen but you would know if that happens for your applications or not and can plan accordingly.
n
Given some of the data that appears in the labels of the time_series_v2 table, it would seem that simply restarting pods by virtue of software upgrades or simply restarting on new hosts would create new unique time series entries, for example:
â  6595736958511230884 â {"__name__":"chi_clickhouse_metric_MySQLThreads","app":"clickhouse-operator","chi":"signoz","clickhouse_altinity_com_app":"chop","clickhouse_altinity_com_chop":"0.19.0","clickhouse_altinity_com_chop_commit":"e74501f","clickhouse_altinity_com_chop_date":"2022-07-07T15.24.24","hostname":"chi-signoz-cluster-0-0.platform.svc.cluster.local","instance":"10.240.0.113:8888","job":"kubernetes-pods","kubernetes_namespace":"platform","kubernetes_pod_name":"clickhouse-operator-74b4b658fc-7l5n6","namespace":"platform","pod_template_hash":"74b4b658fc","security_istio_io_tlsMode":"istio","service_istio_io_canonical_name":"clickhouse-operator","service_istio_io_canonical_revision":"latest"} â
â  8138337155686107918 â {"__name__":"chi_clickhouse_metric_MySQLThreads","app":"clickhouse-operator","chi":"signoz","clickhouse_altinity_com_app":"chop","clickhouse_altinity_com_chop":"0.18.5","clickhouse_altinity_com_chop_commit":"1c16177","clickhouse_altinity_com_chop_date":"2022-05-11T09.06.01","hostname":"chi-signoz-cluster-0-0.platform.svc.cluster.local","instance":"10.240.2.128:8888","job":"kubernetes-pods","kubernetes_namespace":"kube-system","kubernetes_pod_name":"clickhouse-operator-855c6747d8-p26p8","namespace":"platform","pod_template_hash":"855c6747d8"} â
â 12046837290149885158 â {"__name__":"chi_clickhouse_metric_MySQLThreads","app":"clickhouse-operator","chi":"signoz","clickhouse_altinity_com_app":"chop","clickhouse_altinity_com_chop":"0.19.0","clickhouse_altinity_com_chop_commit":"1008f1a","clickhouse_altinity_com_chop_date":"2022-07-11T07.00.49","hostname":"chi-signoz-cluster-0-0.platform.svc.cluster.local","instance":"10.240.1.38:15020","job":"kubernetes-pods","kubernetes_namespace":"platform","kubernetes_pod_name":"clickhouse-operator-74b4b658fc-s4bm5","namespace":"platform","pod_template_hash":"74b4b658fc","security_istio_io_tlsMode":"istio","service_istio_io_canonical_name":"clickhouse-operator","service_istio_io_canonical_revision":"latest"} â
â 14660607235865748604 â {"__name__":"chi_clickhouse_metric_MySQLThreads","app":"clickhouse-operator","chi":"signoz","clickhouse_altinity_com_chop":"0.18.5","hostname":"chi-signoz-cluster-0-0.platform.svc.cluster.local","instance":"10.240.1.109:8888","job":"kubernetes-service-endpoints","kubernetes_name":"clickhouse-operator-metrics","kubernetes_namespace":"kube-system","namespace":"platform"} â
â 15272912369895314868 â {"__name__":"chi_clickhouse_metric_MySQLThreads","app":"clickhouse-operator","chi":"signoz","clickhouse_altinity_com_app":"chop","clickhouse_altinity_com_chop":"0.19.0","clickhouse_altinity_com_chop_commit":"1008f1a","clickhouse_altinity_com_chop_date":"2022-07-11T07.00.49","hostname":"chi-signoz-cluster-0-0.platform.svc.cluster.local","instance":"10.240.2.104:15020","job":"kubernetes-pods","kubernetes_namespace":"platform","kubernetes_pod_name":"clickhouse-operator-74b4b658fc-x2x2j","namespace":"platform","pod_template_hash":"74b4b658fc","security_istio_io_tlsMode":"istio","service_istio_io_canonical_name":"clickhouse-operator","service_istio_io_canonical_revision":"latest"} â
My system currently runs 190 Pods, so I could imagine if I were running a few thousand that there could be a lot of time-series entries. I suspect that the quantity of time-series entries comes from tracing connections using Istio. The problem as I see it, is that the memory footprint of the query-service that will directly relate to the cost of the VM required to host it. The services I run are on 4GB VMs and I will have to migrate the cluster to 8GB VMs to support the query-service. The cost of running a single instance of a query-service at 3GB RSS becomes equates to the monthly rental price of a 4GB VM. Could the map of this data be file backed e.g stored in SQLlite or perhaps LevelDB? Similar to the existing hashmap, it need not require persistent storage, but could be a useful way to offload a significant chunk of RAM utilisation
s
Some of the labels might contain the resource host information added additionally but they are small in number. And we could certainly improve the implementation for users who have tight requirements for RAM but are fine with the some decreased performance as a trade-off.
a
@Nick Burrett I see the issue.. thought the total memory needed to keep timeseries in memory should be just 80MB for 400K timeseries, the initial spike in memory needs is due to json unmarshalling of 400K timeseries in one go once query-service boots up. If we can load timeseries in batches of 100K during bootup, it should be able to work in 25% of the initial memory needed. @Srikanth Chekuri am I understanding correctly?
s
Yes, the initial deserialization takes up additional resources.
@Nick Burrett is it just the query service that consumes 3.9GB of memory ?
n
Yes, it's only the query service. The rest is fine.
s
Ok, did it use the 3.9GB of RAM alone or was it combined for all services?
n
I'm measuring only the query-service container utilising 2.9GB, Every other container runs at low memory utilisation. Because of the limits on my VM RAM and for simplicity of figuring out what is happening, I built+ran the query-service direct from a git to measure the memory utilisation from my desktop PC.
s
From the pprof dump you shared it shows ~ 1.7 GB for timeseries loader. I am little confused now. How much does query service utilize?
n
Peak RSS is 3165MB. After a while RSS settles varies from 2540MB and 2800MB. pprof memory utilisation eventually stays around 1.67GB This heap dump attached is taken at a time when RSS is 2586756KB and the service is idle. Presumably Go's garbage collection is influencing the RSS variance.