Hi Retention period seems to have reset to default values fo SigNoz Community #general

Hi, Retention period seems to have reset to defaul...

Noel McGrath

06/18/2024, 9:46 AM

Hi, Retention period seems to have reset to default values for some reason. We have set Retention Period following docs @ https://signoz.io/docs/userguide/retention-period/ We set this months ago when we initially installed Signoz as it is recommended to set retention period early in the lifecycle of the platform. In sqlite signoz.db we can see the values: 54 c2da421ae3ae4c1197e0b7ee05377176 2024-03-28 133806.07271329+00:00 2024-03-28 133806.456254169+00:00 signoz_metrics.samples_v2 31104000 604800 success After 7 days(604800) move to S3 cold storage for 1 year(31104000). We have done this for Metrics, Traces and Logs. Since this time(back in March) we have done a number of upgrades of SigNoz and have just noticed there is a change to our Metrics retention period. It is now showing as 1 month and no cold storage setting. See attached image What would have caused this to have changed and why did it only do it for Metrics and not all(Traces and Logs also)? Also we now have months of metrics data, will there be an issue with us changing the retention now?

Srikanth Chekuri

06/18/2024, 2:02 PM

What would have caused this to have changed and why did it only do it for Metrics and not all(Traces and Logs also)?

We migrated to a new set of tables for metrics.

Also we now have months of metrics data, will there be an issue with us changing the retention now?

What is you overall volume of samples count and time series count?

Noel McGrath

06/19/2024, 12:50 PM

thanks @Srikanth Chekuri We migrated to a new set of tables for metrics Why did this impact the TTL retention for metrics and not others(traces,logs) and is this now configured somewhere else in signoz.db or in clickhouse?

Noel McGrath

06/19/2024, 1:02 PM

What is you overall volume of samples count and time series count? clickhouse :) SELECT count() FROM distributed_samples_v4; SELECT count() FROM distributed_samples_v4 Query id: bffed02c-2a45-4e91-81ca-3e4912f96834 ┌────count()─┐ │ 1190339504 │ └────────────┘ 1 row in set. Elapsed: 0.003 sec. clickhouse :) SELECT count() FROM distributed_time_series_v4; SELECT count() FROM distributed_time_series_v4 Query id: 0b5361e2-cf08-4c07-96b3-b0d58ef18698 ┌──count()─┐ │ 23247642 │ └──────────┘ 1 row in set. Elapsed: 0.003 sec.

Noel McGrath

06/20/2024, 7:45 AM

@Srikanth Chekuri can you explain our query above: "Why did this impact the TTL retention for metrics..."

Srikanth Chekuri

06/20/2024, 9:21 AM

As I said earlier, we have migrated to new tables with a default TTL of 30 days. It was an overlooked aspect. It should have been considered to use TTL from the past table instead of the default. You should be able to update without issues. The CPU usage might spike when you trigger but after some time it should go back to normal.

Noel McGrath

06/20/2024, 10:49 AM

@Srikanth Chekuri ok thanks for reply, just to be clear if we change the settings now in signoz ui the value will be saved to the ttl_status table in signoz.db?

Srikanth Chekuri

06/20/2024, 12:13 PM

yes

Noel McGrath

06/20/2024, 12:50 PM

@Srikanth Chekuri so we tried to update ttl and getting the following error: {"level":"ERROR","timestamp":"2024-06-20T123346.059Z","caller":"clickhouseReader/reader.go:2432","msg":"error while setting ttl.","error":"code: 47, message: There was an error on [clickhouse9000] Code: 47. DB:Exception *Missing columns: 'timestamp_ms*' while processing query: 'toDateTime(toUInt32(timestamp_ms / 1000), 'UTC') + toIntervalSecond(31104000)', required columns: 'timestamp_ms' 'timestamp_ms'. (UNKNOWN_IDENTIFIER) (version 24.1.2.5 (official build))","stacktrace":"go.signoz.io/signoz/pkg/query-service/app/clickhouseReader.(*ClickHouseReader).SetTTL.func2\n\t/home/runner/work/signoz/signoz/pkg/query-service/app/clickhouseReader/reader.go:2432"}

Srikanth Chekuri

06/21/2024, 2:24 AM

What version of SigNoz are you using?

Noel McGrath

06/21/2024, 6:31 AM

v0.46.0

Srikanth Chekuri

06/21/2024, 8:44 AM

Can you upgrade to 0.47 or 0.48 and try again?

Noel McGrath

06/21/2024, 2:55 PM

@Srikanth Chekuri we will try this, it will be Monday before we can do it

Noel McGrath

06/24/2024, 10:34 AM

@Srikanth Chekuri We done this, took a couple of time before successful.

Copy code

i/o timeout","errorVerbose":"read:\n    <http://github.com/ClickHouse/ch-go/proto.(*Reader).ReadFull|github.com/ClickHouse/ch-go/proto.(*Reader).ReadFull>\n        /home/runner/go/pkg/mod/github.com/!click!house/ch-go@v0.61.3/proto/reader.go:62\n  - read tcp 172.200.0.3:53630->172.200.0.6:9000: i/o timeout","stacktrace":"<http://go.signoz.io/signoz/pkg/query-service/app/clickhouseReader.(*ClickHouseReader).SetTTL.func2|go.signoz.io/signoz/pkg/query-service/app/clickhouseReader.(*ClickHouseReader).SetTTL.func2>\n\t/home/runner/work/signoz/signoz/pkg/query-service/app/clickhouseReader/reader.go:2348"}

As changing of retention is a big operation after go live and the recommendation is to set at the start, Is there any measures that can be put in place to ensure when there is changes that there is no knock on effect where the ttl values are set back to defaults?

111 Views

Open in Slack

Previous Next