This message was deleted SigNoz Community #support

Join Slack

This message was deleted.

# support

Slackbot

07/28/2022, 7:54 AM

This message was deleted.

Ankit Nayan

07/28/2022, 8:28 AM

@Nick Burrett query-service IMO has nothing to do with migration.

Ankit Nayan

07/28/2022, 8:29 AM

What is the query-service fetching from the DB and caching in memory during the initialisation phase?

time-series probably cc: @Srikanth Chekuri. How many time-series do you have? Can you

exec -it

into your clickhouse container and run

Ankit Nayan

07/28/2022, 8:29 AM

Copy code

clickhouse client

Ankit Nayan

07/28/2022, 8:30 AM

Copy code

select count() from signoz_metrics.time_series_v2;

Ankit Nayan

07/28/2022, 8:32 AM

we have an alernate clickhouse way to run matchers for which timeseries are loaded in-memory during initialization. It can be initialised using an env var IMO. But it has not been tested yet. We can check that out if number of timeseries is huge

Ankit Nayan

07/28/2022, 8:37 AM

@Prashant Shahi do we have a memory profiler in query-service which @Nick Burrett can use and send us a dump. It would be best to act on that

Prashant Shahi

07/28/2022, 9:00 AM

@Nick Burrett To obtain pprof data from query-service, follow the steps below. Port forward

from

query-service

pod:

Copy code

kubectl port-forward -n platform pod/my-release-signoz-query-service-0 6060

In another terminal, run the following to obtain pprof data: • CPU Profile

Copy code

curl "<http://localhost:6060/debug/pprof/profile?seconds=30>" -o query-service.pprof -v

• Heap Profile

Copy code

curl "<http://localhost:6060/debug/pprof/heap>" -o query-service-heap.pprof -v

After that share the obtained pprof file

query-service.pprof

in this thread.

Nick Burrett

07/28/2022, 3:18 PM

count() from time_series_v2

gives a value of

. Heap profile attached

Srikanth Chekuri

07/28/2022, 5:07 PM

I'm trying to figure out what memory sizing I really need to use for the query-service and whether that's going to grow significantly if I push more data into Clickhouse

So it's not the amount of data you push to ClickHouse that affects here. If the data you are pushing has new unique time series then it grows. It's not usually the case we have seen but you would know if that happens for your applications or not and can plan accordingly.

Nick Burrett

07/28/2022, 7:39 PM

Given some of the data that appears in the labels of the time_series_v2 table, it would seem that simply restarting pods by virtue of software upgrades or simply restarting on new hosts would create new unique time series entries, for example:

â  6595736958511230884 â {"__name__":"chi_clickhouse_metric_MySQLThreads","app":"clickhouse-operator","chi":"signoz","clickhouse_altinity_com_app":"chop","clickhouse_altinity_com_chop":"0.19.0","clickhouse_altinity_com_chop_commit":"e74501f","clickhouse_altinity_com_chop_date":"2022-07-07T15.24.24","hostname":"chi-signoz-cluster-0-0.platform.svc.cluster.local","instance":"10.240.0.113:8888","job":"kubernetes-pods","kubernetes_namespace":"platform","kubernetes_pod_name":"clickhouse-operator-74b4b658fc-7l5n6","namespace":"platform","pod_template_hash":"74b4b658fc","security_istio_io_tlsMode":"istio","service_istio_io_canonical_name":"clickhouse-operator","service_istio_io_canonical_revision":"latest"} â

â  8138337155686107918 â {"__name__":"chi_clickhouse_metric_MySQLThreads","app":"clickhouse-operator","chi":"signoz","clickhouse_altinity_com_app":"chop","clickhouse_altinity_com_chop":"0.18.5","clickhouse_altinity_com_chop_commit":"1c16177","clickhouse_altinity_com_chop_date":"2022-05-11T09.06.01","hostname":"chi-signoz-cluster-0-0.platform.svc.cluster.local","instance":"10.240.2.128:8888","job":"kubernetes-pods","kubernetes_namespace":"kube-system","kubernetes_pod_name":"clickhouse-operator-855c6747d8-p26p8","namespace":"platform","pod_template_hash":"855c6747d8"} â

â 12046837290149885158 â {"__name__":"chi_clickhouse_metric_MySQLThreads","app":"clickhouse-operator","chi":"signoz","clickhouse_altinity_com_app":"chop","clickhouse_altinity_com_chop":"0.19.0","clickhouse_altinity_com_chop_commit":"1008f1a","clickhouse_altinity_com_chop_date":"2022-07-11T07.00.49","hostname":"chi-signoz-cluster-0-0.platform.svc.cluster.local","instance":"10.240.1.38:15020","job":"kubernetes-pods","kubernetes_namespace":"platform","kubernetes_pod_name":"clickhouse-operator-74b4b658fc-s4bm5","namespace":"platform","pod_template_hash":"74b4b658fc","security_istio_io_tlsMode":"istio","service_istio_io_canonical_name":"clickhouse-operator","service_istio_io_canonical_revision":"latest"} â

â 14660607235865748604 â {"__name__":"chi_clickhouse_metric_MySQLThreads","app":"clickhouse-operator","chi":"signoz","clickhouse_altinity_com_chop":"0.18.5","hostname":"chi-signoz-cluster-0-0.platform.svc.cluster.local","instance":"10.240.1.109:8888","job":"kubernetes-service-endpoints","kubernetes_name":"clickhouse-operator-metrics","kubernetes_namespace":"kube-system","namespace":"platform"} â

â 15272912369895314868 â {"__name__":"chi_clickhouse_metric_MySQLThreads","app":"clickhouse-operator","chi":"signoz","clickhouse_altinity_com_app":"chop","clickhouse_altinity_com_chop":"0.19.0","clickhouse_altinity_com_chop_commit":"1008f1a","clickhouse_altinity_com_chop_date":"2022-07-11T07.00.49","hostname":"chi-signoz-cluster-0-0.platform.svc.cluster.local","instance":"10.240.2.104:15020","job":"kubernetes-pods","kubernetes_namespace":"platform","kubernetes_pod_name":"clickhouse-operator-74b4b658fc-x2x2j","namespace":"platform","pod_template_hash":"74b4b658fc","security_istio_io_tlsMode":"istio","service_istio_io_canonical_name":"clickhouse-operator","service_istio_io_canonical_revision":"latest"} â

My system currently runs 190 Pods, so I could imagine if I were running a few thousand that there could be a lot of time-series entries. I suspect that the quantity of time-series entries comes from tracing connections using Istio. The problem as I see it, is that the memory footprint of the query-service that will directly relate to the cost of the VM required to host it. The services I run are on 4GB VMs and I will have to migrate the cluster to 8GB VMs to support the query-service. The cost of running a single instance of a query-service at 3GB RSS becomes equates to the monthly rental price of a 4GB VM. Could the map of this data be file backed e.g stored in SQLlite or perhaps LevelDB? Similar to the existing hashmap, it need not require persistent storage, but could be a useful way to offload a significant chunk of RAM utilisation

Srikanth Chekuri

07/28/2022, 9:09 PM

Some of the labels might contain the resource host information added additionally but they are small in number. And we could certainly improve the implementation for users who have tight requirements for RAM but are fine with the some decreased performance as a trade-off.

Ankit Nayan

07/29/2022, 6:16 AM

@Nick Burrett I see the issue.. thought the total memory needed to keep timeseries in memory should be just 80MB for 400K timeseries, the initial spike in memory needs is due to json unmarshalling of 400K timeseries in one go once query-service boots up. If we can load timeseries in batches of 100K during bootup, it should be able to work in 25% of the initial memory needed. @Srikanth Chekuri am I understanding correctly?

Srikanth Chekuri

07/29/2022, 6:45 AM

Yes, the initial deserialization takes up additional resources.

Srikanth Chekuri

07/31/2022, 6:35 AM

@Nick Burrett is it just the query service that consumes 3.9GB of memory ?

Nick Burrett

07/31/2022, 10:24 AM

Yes, it's only the query service. The rest is fine.

Srikanth Chekuri

07/31/2022, 10:26 AM

Ok, did it use the 3.9GB of RAM alone or was it combined for all services?

Nick Burrett

07/31/2022, 10:32 AM

I'm measuring only the query-service container utilising 2.9GB, Every other container runs at low memory utilisation. Because of the limits on my VM RAM and for simplicity of figuring out what is happening, I built+ran the query-service direct from a git to measure the memory utilisation from my desktop PC.

Srikanth Chekuri

07/31/2022, 10:34 AM

From the pprof dump you shared it shows ~ 1.7 GB for timeseries loader. I am little confused now. How much does query service utilize?

Nick Burrett

07/31/2022, 10:49 AM

Peak RSS is 3165MB. After a while RSS settles varies from 2540MB and 2800MB. pprof memory utilisation eventually stays around 1.67GB This heap dump attached is taken at a time when RSS is 2586756KB and the service is idle. Presumably Go's garbage collection is influencing the RSS variance.

5 Views

Open in Slack

Previous Next