Hello! Trying to reduce the retention period for m...
# support
c
Hello! Trying to reduce the retention period for my traces and metrics, but it always fails. I see the following error in Clickhouse: (in thread)
Copy code
2024.04.16 05:06:14.845832 [ 744 ] {67859881-bda7-47c1-919a-64ec8f97cb42} <Information> signoz_traces.signoz_spans (739b3bed-5ba4-4a8b-9674-e8edfdfdefde): Added mutation: mutation_398750.txt
2024.04.16 05:06:14.845922 [ 744 ] {67859881-bda7-47c1-919a-64ec8f97cb42} <Information> signoz_traces.signoz_spans (739b3bed-5ba4-4a8b-9674-e8edfdfdefde): Waiting mutation: mutation_398750.txt
2024.04.16 05:08:30.413839 [ 744 ] {67859881-bda7-47c1-919a-64ec8f97cb42} <Information> signoz_traces.signoz_spans (739b3bed-5ba4-4a8b-9674-e8edfdfdefde): Mutation mutation_398750.txt done
2024.04.16 05:08:30.439409 [ 744 ] {a84099a4-c464-40a0-a0e9-cfbc37a0d544} <Information> signoz_traces.signoz_error_index_v2 (1a37171e-90f2-4caa-82ba-eecd46d45cfb): Added mutation: mutation_508.txt
2024.04.16 05:08:30.439534 [ 744 ] {a84099a4-c464-40a0-a0e9-cfbc37a0d544} <Information> signoz_traces.signoz_error_index_v2 (1a37171e-90f2-4caa-82ba-eecd46d45cfb): Waiting mutation: mutation_508.txt
2024.04.16 05:08:30.469154 [ 744 ] {a84099a4-c464-40a0-a0e9-cfbc37a0d544} <Information> signoz_traces.signoz_error_index_v2 (1a37171e-90f2-4caa-82ba-eecd46d45cfb): Mutation mutation_508.txt done
2024.04.16 05:08:30.488173 [ 744 ] {51a77a41-3982-4df5-80b3-ae6c474cb05a} <Information> signoz_traces.usage_explorer (092ca84e-6827-4567-ba98-11e6d7d74d6a): Added mutation: mutation_383957.txt
2024.04.16 05:08:30.488253 [ 744 ] {51a77a41-3982-4df5-80b3-ae6c474cb05a} <Information> signoz_traces.usage_explorer (092ca84e-6827-4567-ba98-11e6d7d74d6a): Waiting mutation: mutation_383957.txt
2024.04.16 05:08:30.509097 [ 744 ] {51a77a41-3982-4df5-80b3-ae6c474cb05a} <Information> signoz_traces.usage_explorer (092ca84e-6827-4567-ba98-11e6d7d74d6a): Mutation mutation_383957.txt done
2024.04.16 05:08:30.552372 [ 744 ] {b127cbc7-bea7-4940-9105-c18d96e6ae6d} <Information> signoz_traces.signoz_index_v2 (3231495a-8d90-4059-9976-40354187621f): Added mutation: mutation_384741.txt
2024.04.16 05:08:30.552662 [ 744 ] {b127cbc7-bea7-4940-9105-c18d96e6ae6d} <Information> signoz_traces.signoz_index_v2 (3231495a-8d90-4059-9976-40354187621f): Waiting mutation: mutation_384741.txt
2024.04.16 05:11:14.853880 [ 792 ] {9681271b-0cc5-469c-8b7b-327d2a77856b} <Information> TCPHandler: Client has dropped the connection, cancel the query.
2024.04.16 05:11:14.877178 [ 792 ] {9681271b-0cc5-469c-8b7b-327d2a77856b} <Information> executeQuery: Code: 394. DB::Exception: Query was cancelled or a client has unexpectedly dropped the connection. (QUERY_WAS_CANCELLED) (version 24.1.2.5 (official build)) (from [::ffff:10.3.195.132]:46756) (in query: ALTER TABLE signoz_traces.dependency_graph_minutes_v2 ON CLUSTER cluster MODIFY TTL toDateTime(timestamp) + INTERVAL 604800 SECOND DELETE SETTINGS distributed_ddl_task_timeout = -1;)
2024.04.16 05:11:14.888104 [ 10 ] {df2de30c-cbe7-4997-a3e2-31d9caaa7361} <Information> TCPHandler: Client has dropped the connection, cancel the query.
2024.04.16 05:11:14.953967 [ 793 ] {44bb1792-73b4-425c-a9e1-76a30eb8023b} <Information> TCPHandler: Client has dropped the connection, cancel the query.
2024.04.16 05:11:15.878041 [ 10 ] {df2de30c-cbe7-4997-a3e2-31d9caaa7361} <Information> executeQuery: Code: 394. DB::Exception: Query was cancelled or a client has unexpectedly dropped the connection. (QUERY_WAS_CANCELLED) (version 24.1.2.5 (official build)) (from [::ffff:10.3.195.132]:46744) (in query: ALTER TABLE signoz_traces.signoz_index_v2 ON CLUSTER cluster MODIFY TTL toDateTime(timestamp) + INTERVAL 604800 SECOND DELETE SETTINGS distributed_ddl_task_timeout = -1;)
2024.04.16 05:11:15.878502 [ 793 ] {44bb1792-73b4-425c-a9e1-76a30eb8023b} <Information> executeQuery: Code: 394. DB::Exception: Query was cancelled or a client has unexpectedly dropped the connection. (QUERY_WAS_CANCELLED) (version 24.1.2.5 (official build)) (from [::ffff:10.3.195.132]:46768) (in query: ALTER TABLE signoz_traces.durationSort ON CLUSTER cluster MODIFY TTL toDateTime(timestamp) + INTERVAL 604800 SECOND DELETE SETTINGS distributed_ddl_task_timeout = -1;)
2024.04.16 05:14:21.185216 [ 744 ] {b127cbc7-bea7-4940-9105-c18d96e6ae6d} <Information> signoz_traces.signoz_index_v2 (3231495a-8d90-4059-9976-40354187621f): Mutation mutation_384741.txt done
2024.04.16 05:14:21.220112 [ 744 ] {e3b80044-f7f8-4451-9f26-b334ac180f31} <Information> signoz_traces.dependency_graph_minutes_v2 (4c8b8d3c-b898-4b23-bb2c-f8c7ecc3c025): Added mutation: mutation_298823.txt
2024.04.16 05:14:21.220366 [ 744 ] {e3b80044-f7f8-4451-9f26-b334ac180f31} <Information> signoz_traces.dependency_graph_minutes_v2 (4c8b8d3c-b898-4b23-bb2c-f8c7ecc3c025): Waiting mutation: mutation_298823.txt
2024.04.16 05:14:22.196966 [ 744 ] {e3b80044-f7f8-4451-9f26-b334ac180f31} <Information> signoz_traces.dependency_graph_minutes_v2 (4c8b8d3c-b898-4b23-bb2c-f8c7ecc3c025): Mutation mutation_298823.txt done
2024.04.16 05:14:22.222721 [ 744 ] {f1e9ba90-f555-4d41-bb46-50723ffbce36} <Information> signoz_traces.durationSort (359561e1-e50c-4127-a599-c4f5b19bbfb7): Added mutation: mutation_382447.txt
2024.04.16 05:14:22.222943 [ 744 ] {f1e9ba90-f555-4d41-bb46-50723ffbce36} <Information> signoz_traces.durationSort (359561e1-e50c-4127-a599-c4f5b19bbfb7): Waiting mutation: mutation_382447.txt
I tried making our clickhouse instance really big (7 vCPU) but it would still timeout after 5 minutes. Any way to increase timeout on the query-service side?
n
Changing retention for existing data is a very expensive operation, it requires updating all the data as it is a mutation query. If you can give us some idea about your existing retention settings and amount of data ingested every day it will help. @Vishal Sharma please add what are the better way to do this.
c
I think I had the default of 15 days, and trying to go down to 7 days
currently ingesting about
939827
spans a day
v
Hey @Carlos Martell It's not recommended to change retention period after you have too much data.
> Any way to increase timeout on the query-service side? Changing retention query keeps executing even after timeouts. According to your logs looks like query service connection is being reset. Best way to change retention now would be try changing retention after connecting to clickhouse.
You can connect to clickhouse and run the mutation query yourself as well
Run these queries
Copy code
ALTER TABLE signoz_traces.durationSort ON CLUSTER cluster MODIFY TTL toDateTime(timestamp) + INTERVAL 604800 SECOND DELETE SETTINGS distributed_ddl_task_timeout = -1;

ALTER TABLE signoz_traces.signoz_index_v2 ON CLUSTER cluster MODIFY TTL toDateTime(timestamp) + INTERVAL 604800 SECOND DELETE SETTINGS distributed_ddl_task_timeout = -1;

ALTER TABLE signoz_traces.signoz_spans ON CLUSTER cluster MODIFY TTL toDateTime(timestamp) + INTERVAL 604800 SECOND DELETE SETTINGS distributed_ddl_task_timeout = -1;
c
Weird! it was always timing out after 5 minutes
Actually, I stopped the amount of data I was sending and retried to change the retention period, and it worked 🙂
a
> I stopped the amount of data I was sending Hi @Carlos Martell do mean you stopped all of your collectors and agents from emitting telemetry data to signoz and then modified your retention setting? I am unable to extend the retention for metrics, it fails every time also.
@Vishal Sharma Would you mind sharing the metrics equivalent that I can use to extend metrics collection to 30 days.
c
@Al correct, I temporarily stopped the OTEL collector for Signoz and did the change
a
My attempt failed with:
Copy code
executeQuery: Code: 394. DB::Exception: Query was cancelled or a client has unexpectedly dropped the connection. (QUERY_WAS_CANCELLED) (version 24.1.2.5 (official build)) (from [::ffff:10.244.1.48]:33284) (in query: ALTER TABLE signoz_metrics.samples_v2 ON CLUSTER cluster MODIFY TTL toDateTime(toUInt32(timestamp_ms / 1000), 'UTC') + INTERVAL 2419200 SECOND DELETE SETTINGS distributed_ddl_task_timeout = -1)