Hi there I m experiencing strange phenomenon I m counting da SigNoz Community #support

Hi there... I'm experiencing strange phenomenon. ...

최현진

06/27/2024, 5:10 AM

Hi there... I'm experiencing strange phenomenon. I'm counting datas in signoz_metrics.distributed_time_series_v4 between '2024-06-27 034500' and '2024-06-27 041500' Whenever I run query the result count is not equal. I have no idea why the result is not equal.

Copy code

SELECT count(*)
FROM signoz_metrics.distributed_time_series_v4
WHERE (temporality = 'Cumulative') AND (unix_milli >= (toUnixTimestamp('2024-06-27 03:45:00') * 1000)) AND (unix_milli < (toUnixTimestamp('2024-06-27 04:15:00') * 1000))

Query id: 41f49dcd-04f9-42e6-ab7c-dd291b9c16f7

┌─count()─┐
│   95828 │
└─────────┘

1 row in set. Elapsed: 0.010 sec. Processed 548.86 thousand rows, 4.94 MB (56.26 million rows/s., 506.35 MB/s.)
Peak memory usage: 1.49 MiB.

chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.signoz.svc.cluster.local :) select count(*) ^I^I
FROM signoz_metrics.distributed_time_series_v4
where (temporality = 'Cumulative') AND (unix_milli >= toUnixTimestamp('2024-06-27 03:45:00')*1000) AND (unix_milli < toUnixTimestamp('2024-06-27 04:15:00')*1000)

SELECT count(*)
FROM signoz_metrics.distributed_time_series_v4
WHERE (temporality = 'Cumulative') AND (unix_milli >= (toUnixTimestamp('2024-06-27 03:45:00') * 1000)) AND (unix_milli < (toUnixTimestamp('2024-06-27 04:15:00') * 1000))

Query id: d10be415-73fa-473d-8143-4b4fdded0fd8

┌─count()─┐
│   96381 │
└─────────┘

1 row in set. Elapsed: 0.013 sec. Processed 565.25 thousand rows, 5.09 MB (44.31 million rows/s., 398.75 MB/s.)
Peak memory usage: 1.50 MiB.

chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.signoz.svc.cluster.local :) select count(*) ^I^I
FROM signoz_metrics.distributed_time_series_v4
where (temporality = 'Cumulative') AND (unix_milli >= toUnixTimestamp('2024-06-27 03:45:00')*1000) AND (unix_milli < toUnixTimestamp('2024-06-27 04:15:00')*1000)

SELECT count(*)
FROM signoz_metrics.distributed_time_series_v4
WHERE (temporality = 'Cumulative') AND (unix_milli >= (toUnixTimestamp('2024-06-27 03:45:00') * 1000)) AND (unix_milli < (toUnixTimestamp('2024-06-27 04:15:00') * 1000))

Query id: 66ad5e01-4a08-4810-9901-25ffa0b63802

┌─count()─┐
│   96379 │
└─────────┘

Srikanth Chekuri

06/27/2024, 6:20 AM

It's a replacing merge tree and duplicates get removed continuously.

👍 1

최현진

06/27/2024, 6:23 AM

but I made condition on the unix_milli. I think inserted data on the past should be same. Am I wrong?

Srikanth Chekuri

06/27/2024, 6:24 AM

The "past" is not too past in time from the query. The replacing merge tree is the reason for not seeing the same count. How often does it remove duplicates is non-deterministic.

최현진

06/27/2024, 6:27 AM

Hm...... are there any chance the new data have same key (env, temporality, metric_name, fingerprint, unix_milli) even on the past?

Copy code

CREATE TABLE signoz_metrics.time_series_v4
(
    `env` LowCardinality(String) DEFAULT 'default',
    `temporality` LowCardinality(String) DEFAULT 'Unspecified',
    `metric_name` LowCardinality(String),
    `description` LowCardinality(String) DEFAULT '' CODEC(ZSTD(1)),
    `unit` LowCardinality(String) DEFAULT '' CODEC(ZSTD(1)),
    `type` LowCardinality(String) DEFAULT '' CODEC(ZSTD(1)),
    `is_monotonic` Bool DEFAULT false CODEC(ZSTD(1)),
    `fingerprint` UInt64 CODEC(Delta(8), ZSTD(1)),
    `unix_milli` Int64 CODEC(Delta(8), ZSTD(1)),
    `labels` String CODEC(ZSTD(5)),
    INDEX idx_labels labels TYPE ngrambf_v1(4, 1024, 3, 0) GRANULARITY 1
)
ENGINE = ReplacingMergeTree
PARTITION BY toDate(unix_milli / 1000)
ORDER BY (env, temporality, metric_name, fingerprint, unix_milli)
SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192

Srikanth Chekuri

06/27/2024, 6:27 AM

The granularity of unix_milli is hours

최현진

06/27/2024, 6:28 AM

Please explain more detail... I don't get it...

Srikanth Chekuri

06/27/2024, 6:30 AM

Writing the same set of labels with milliseconds precision is redundant and inefficient storage. The

unix_milli

has hours granularity what that means is there is only one record for a time series in an hour window. This is space efficient and makes queries faster by reading less data.

👍 1

최현진

06/27/2024, 6:45 AM

I got it... Thanks

Copy code

┌─fromUnixTimestamp64Milli(unix_milli)─┐
│              2024-05-17 02:00:00.000 │
│              2024-05-17 03:00:00.000 │
│              2024-05-17 04:00:00.000 │
│              2024-05-17 05:00:00.000 │
│              2024-05-17 06:00:00.000 │
│              2024-05-17 07:00:00.000 │
│              2024-05-17 08:00:00.000 │
│              2024-05-17 09:00:00.000 │
│              2024-05-17 10:00:00.000 │
│              2024-05-17 11:00:00.000 │

최현진

06/27/2024, 6:54 AM

Now I understand the replacingmergetree table time_series_v4 But, I have same phenomenon with samples_v4 table. It's mergetree table.

Copy code

SELECT *
FROM signoz_metrics.distributed_samples_v4
WHERE (metric_name = 'jvm_thread_count') AND (unix_milli >= (toUnixTimestamp('2024-06-27 03:45:00') * 1000)) AND (unix_milli < (toUnixTimestamp('2024-06-27 04:15:00') * 1000))

최현진

06/27/2024, 10:45 AM

@Srikanth Chekuri plz check below phenamemon once more. The samples_v4 table is MergeTree engine but have same sympton. Whenever I count the data it shows not equal numbers.

Copy code

CREATE TABLE signoz_metrics.samples_v4
(
    `env` LowCardinality(String) DEFAULT 'default',
    `temporality` LowCardinality(String) DEFAULT 'Unspecified',
    `metric_name` LowCardinality(String),
    `fingerprint` UInt64 CODEC(Delta(8), ZSTD(1)),
    `unix_milli` Int64 CODEC(DoubleDelta, ZSTD(1)),
    `value` Float64 CODEC(Gorilla, ZSTD(1))
)
ENGINE = MergeTree
PARTITION BY toDate(unix_milli / 1000)
ORDER BY (env, temporality, metric_name, fingerprint, unix_milli)
SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192

Copy code

SELECT count(*)
FROM signoz_metrics.distributed_samples_v4
WHERE (metric_name = 'jvm_thread_count') AND (unix_milli >= (toUnixTimestamp('2024-06-27 03:45:00') * 1000)) AND (unix_milli < (toUnixTimestamp('2024-06-27 04:15:00') * 1000))

Query id: 6fd4ea71-9e93-43de-b5b6-5c530df26f8b

┌─count()─┐
│      26 │
└─────────┘

1 row in set. Elapsed: 0.011 sec. Processed 122.88 thousand rows, 292.78 KB (11.19 million rows/s., 26.66 MB/s.)
Peak memory usage: 1.49 MiB.

chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.signoz.svc.cluster.local :) SELECT count(*)
FROM signoz_metrics.distributed_samples_v4
WHERE (metric_name = 'jvm_thread_count') AND (unix_milli >= (toUnixTimestamp('2024-06-27 03:45:00') * 1000)) AND (unix_milli < (toUnixTimestamp('2024-06-27 04:15:00') * 1000))

SELECT count(*)
FROM signoz_metrics.distributed_samples_v4
WHERE (metric_name = 'jvm_thread_count') AND (unix_milli >= (toUnixTimestamp('2024-06-27 03:45:00') * 1000)) AND (unix_milli < (toUnixTimestamp('2024-06-27 04:15:00') * 1000))

Query id: faaacb94-71c2-4430-be3e-2d7408d1a9c9

┌─count()─┐
│      36 │
└─────────┘

1 row in set. Elapsed: 0.010 sec. Processed 147.46 thousand rows, 354.62 KB (14.33 million rows/s., 34.46 MB/s.)
Peak memory usage: 1.49 MiB.

chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.signoz.svc.cluster.local :) SELECT count(*)
FROM signoz_metrics.distributed_samples_v4
WHERE (metric_name = 'jvm_thread_count') AND (unix_milli >= (toUnixTimestamp('2024-06-27 03:45:00') * 1000)) AND (unix_milli < (toUnixTimestamp('2024-06-27 04:15:00') * 1000))

SELECT count(*)
FROM signoz_metrics.distributed_samples_v4
WHERE (metric_name = 'jvm_thread_count') AND (unix_milli >= (toUnixTimestamp('2024-06-27 03:45:00') * 1000)) AND (unix_milli < (toUnixTimestamp('2024-06-27 04:15:00') * 1000))

Query id: f9069048-a350-42ec-931b-1d21524a5f75

┌─count()─┐
│      27 │
└─────────┘

1 row in set. Elapsed: 0.011 sec. Processed 122.88 thousand rows, 283.07 KB (11.08 million rows/s., 25.52 MB/s.)
Peak memory usage: 1.48 MiB.

Srikanth Chekuri

06/27/2024, 2:42 PM

The only way this can happen is if your end timestamp is not in the past (i.e it includes the current time window where data is ingested) or you report a metric with old timestamp.

최현진

06/28/2024, 12:08 AM

OKay. Thank you 🙂

👍 1

2 Views

Open in Slack

Previous Next