Since switching to Clickhouse clustered, with 3 no...
# support
n
Since switching to Clickhouse clustered, with 3 nodes, I'm finding that the CPU balancing between the 3 clickhouse instances is imbalanced. This system is fairly small (240 total pods, with all signoz log traffic filtered) and receives 3700 log lines per minute. I have one clickhouse node permanently running at 400% CPU, another node running between 100% and 300% CPU (dependent on time of day) and another node at a consistent 100% CPU. Graph here is 24 hour period: CPU utilisation seem very high for a database that i would consider to be receiving little in the way of updates. I suspect a missing index somewhere. Is there a description anywhere of what the ideal schema should look like for a signoz install for all the tables and views?
s
Is this chart you shared the ClickHouse instance powering SigNoz?
CPU utilisation seem very high for a database that i would consider to be receiving little in the way of updates.
Which updates are you referring? We don’t push any
UPDATE
commands in SigNoz and where do you see the
database
wise CPU utilization? You can use the Unix tool ``sudo perf top`` to get the operation which are taking time. In the past we have noticed this happens when the
MergeMutate
which is responsible for merge the many temporary files to single data block responsible for this.
n
The chart I shared shows the CPU utilisation on three VMs. Each VM is running clickhouse-server for SigNoz. That clickhouse-server is only used by SigNoz, When I say updates, I mean any modification to the data in the database e.g. a
INSERT INTO signoz_logs
statement, which is effectively an update to a database.
s
No, insert and update are entirely different things. The update operations are CPU intensive and we don’t have any update operations to ClickHouse in SigNoz as of today. How many CPUs are given to each node and what is the write rate? You may want to check
perf top
to check which is claiming more CPU.
n
I know that
INSERT
and
UPDATE
SQL statements are entirely different things but the modifiication of a database in any way is an update to a database (the database has been updated with new data). This sounds like an argument over nomenclature. Each node as 4 CPUs. The write write is 3700 logs per minute (~61 per second) The IO utilisation on the server is less than 5%, with write-queue latency at 12ms.
perf top
isn't giving useful information due to lack of access to any symbols at the moment
s
No, I was not arguing over the nomenclature. I want to use the unambiguous language because the real update operations are known to be CPU intensive for ClickHouse. Any mixed use of the words can be confusing so better avoid that.
n
I was asking if there is a copy of the correct finalised schema written anywhere for me to compare my schema against to ensure it matches. I can only find the schema definitions in the various migration directories
s
No, the final schema is not written anywhere because the schema keeps changing. When the collector boot up they ensure the schema is up to date with the expected schema.
n
Given there are multiple collectors running on a system (i.e. one per node) what stops the race condition of several collectors trying to modify the schema at the same time when it is out of date?
s
The migration program acquires the DB lock before running them. I will have to double check this but this is what I read last time.
n
Looking at the process list, it's taking over 1.2 seconds to run:
SELECT DISTINCT fingerprint FROM signoz_metrics.distributed_time_series_v2
And more than 0.5 seconds to run:
INSERT INTO signoz_logs.distributed_log
So it's generally just slow. But that's compounded by because of the full CPU utilisation of the clickhouse-service on the VM.
s
Do you just use the logs? Or do you also send metrics and traces?
n
I use logs, traces and metrics. I've reduced metrics collection from 30s to 120s, with no impact on CPU. 302,000 traces collected in the past 60 minutes.
I suppose it doesn't help that graphql services tend to produce thousands of spans due to the size of the query-result e.g. a query that returns 10 rows of 30 columns would create a minimum of 10x30 spans
If the fingerprint has to be looked up for each span ingested, then it could be quite expensive
But I'd still expect that these data should be distributed evenly across sharded database, rather than pegging one node at 400% and the others not sharing the load
s
No, there is no kind of look-up based insertion for spans. The span shard key is traceid which is random 128 bits and should do fine for the distribution.
n
I looked at the underlying shared tables (logs, metrics_v2 etc) across the nodes to confirm that the data is balanced, by comparing row counts. After 2 weeks of high CPU utilisation (100% CPU on node 2 for 7 days, followed by 100% CPU on node 1 for 7 days), the system settled down to background levels of utilisation (~10%). So perhaps this is an artefact of moving from 1-node to 3-node clickhouse cluster and the time it takes for imbalanced data to be expired from the cluster based on the existing configured 7 day retention rule.