Hi team, I saw the new version 0.55 include the lo...
# support
h
Hi team, I saw the new version 0.55 include the log migration is released and I have read the migration guide. But before I have confidence to upgrade it. Can someone help explain more details about what’s the log change that breaks the previous log table and how does the migration work to make sure we are smoothly upgrade to new way to avoid us loose any existing logs.
n
So we are moving to new tables for logs, the migration moves the TTL settings and materialized columns if any from the old tables to the new table. Incase you are using custom retention settings, you are required to add some flags, read https://signoz-community.slack.com/archives/C01HWQ1R0BC/p1727323822909249?thread_ts=1727299733.995679&cid=C01HWQ1R0BC
h
I see. and how does it handle log search on both old and new log tables?
n
If you are using clickhouse queries you will have to update them.
h
no. we are just using the UI signoz log search provided
n
the new tables will make searches with resource attributes way faster then the old table, due to the new fingerprinting system we have.
h
if we set
--use-logs-new-schema=false
then signoz query service will try search logs on both old and new tables?
n
no, then it will run the query on the old table.
if it’s set to true then it will run it on the new table.
when the config in otel collector is set to false https://github.com/SigNoz/signoz/blob/87499d1ead6042047ddc7111811813f907330ebf/deploy/docker/clickhouse-setup/otel-collector-config.yaml#L157 . It’s writes to both table. Once you set it to true it will just write to new table
h
I see. that make sense.
so the log volume stored in the Clickhouse will be doubled when it write to both tables right?
n
Yeah for that duration the storage will be doubled, we will try to release a new script that will help users with very long retention to move their data from old table to the new one, but it’s in process.
h
I see. I think I need that script before I upgrade it as we have a very tight disk storage limit and don’t want to expand it too much. Is there a estimate time about when that script will be ready?
n
Expect it by next week as worst case. Though it should be released faster
h
Thanks! Another question is we setup the Clickhouse with 2 replicas so they can fail over with each other. In this case I guess I have to run the migration scripts twice on both replicas right?
n
No you don’t need to run the migration script in both replicas, clickhouse internally takes care of replication when you are using replicated merge tree.
h
Ok. so I just need to run the migration on one of the CH instance then another one will automatically pickup the change?
n
correct.
h
what’s the difference between the helm schema migrator vs. run this kubectl manually? Are they doing the same thing or not?
n
No, helm schema migrator is for adding the schemas, it will always be present. this migrator is something you just run once if required. (both have different purpose)
h
I see. Thanks for clarify this. I just wait for the new script to move old logs to new table before upgrade. I’m more confident on this now.
Hi @nitya-signoz , wants to check with you about if the new script to help us migrate old data is ready or not👀
n
The docs for it are in progress you can have a look at it here https://github.com/SigNoz/signoz-web/pull/849 once that is merged, this page will be updated https://signoz.io/docs/operate/migration/upgrade-0.55
Also just to clarify, you won’t be able to avoid the double storage issue as the script copies the data from old to the new table.
h
Thanks~ and since table is partitioned by day. how about copy 1 day data to new table -> delete 1 day data in old table. so the double storage issue will be limited to 1 day log data
n
Yeah you can try that, here is the current code https://github.com/SigNoz/signoz-db-migrations/blob/main/migration-0.55-data/main.go you can tweak it for the above logic. For us we avoided as we wanted to keep the duplicate data regardless so we didn’t go that approach.
h
@nitya-signoz we get this schema upgrade error. can you help suggest us how to solve the issue?
Copy code
{"level":"info","timestamp":"2024-10-08T06:36:09.449Z","caller":"signozschemamigrator/migrate.go:89","msg":"Setting env var SIGNOZ_CLUSTER","component":"migrate cli","cluster-name":"cluster"}
{"level":"info","timestamp":"2024-10-08T06:36:09.449Z","caller":"signozschemamigrator/migrate.go:106","msg":"Successfully set env var SIGNOZ_CLUSTER ","component":"migrate cli","cluster-name":"cluster"}
{"level":"info","timestamp":"2024-10-08T06:36:09.449Z","caller":"signozschemamigrator/migrate.go:111","msg":"Setting env var SIGNOZ_REPLICATED","component":"migrate cli","replication":true}
{"level":"info","timestamp":"2024-10-08T06:36:09.460Z","caller":"migrationmanager/manager.go:76","msg":"Running migrations for all migrators","component":"migrationmanager"}
{"level":"info","timestamp":"2024-10-08T06:36:09.460Z","caller":"migrationmanager/manager.go:78","msg":"Running migrations for logs","component":"migrationmanager","migrator":"logs"}
{"level":"error","timestamp":"2024-10-08T06:39:10.463Z","caller":"migrationmanager/manager.go:81","msg":"Failed to run migrations for migrator","component":"migrationmanager","migrator":"logs","error":"failed to create database, err: code: 159, message: Watching task /clickhouse/signoz-clickhouse/task_queue/ddl/query-0000000554 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 2 unfinished hosts (0 of them are currently executing the task), they are going to execute the query in background","stacktrace":"<http://github.com/SigNoz/signoz-otel-collector/migrationmanager.(*MigrationManager).Migrate|github.com/SigNoz/signoz-otel-collector/migrationmanager.(*MigrationManager).Migrate>\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/migrationmanager/manager.go:81\nmain.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozschemamigrator/migrate.go:126\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.13/x64/src/runtime/proc.go:267"}
{"level":"fatal","timestamp":"2024-10-08T06:39:10.463Z","caller":"signozschemamigrator/migrate.go:128","msg":"Failed to run migrations","component":"migrate cli","error":"failed to create database, err: code: 159, message: Watching task /clickhouse/signoz-clickhouse/task_queue/ddl/query-0000000554 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 2 unfinished hosts (0 of them are currently executing the task), they are going to execute the query in background","stacktrace":"main.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozschemamigrator/migrate.go:128\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.13/x64/src/runtime/proc.go:267"}
{"level":"info","timestamp":"2024-10-08T06:39:25.333Z","caller":"signozschemamigrator/migrate.go:89","msg":"Setting env var SIGNOZ_CLUSTER","component":"migrate cli","cluster-name":"cluster"}
{"level":"info","timestamp":"2024-10-08T06:39:25.333Z","caller":"signozschemamigrator/migrate.go:106","msg":"Successfully set env var SIGNOZ_CLUSTER ","component":"migrate cli","cluster-name":"cluster"}
{"level":"info","timestamp":"2024-10-08T06:39:25.333Z","caller":"signozschemamigrator/migrate.go:111","msg":"Setting env var SIGNOZ_REPLICATED","component":"migrate cli","replication":true}
{"level":"info","timestamp":"2024-10-08T06:39:25.349Z","caller":"migrationmanager/manager.go:76","msg":"Running migrations for all migrators","component":"migrationmanager"}
{"level":"info","timestamp":"2024-10-08T06:39:25.350Z","caller":"migrationmanager/manager.go:78","msg":"Running migrations for logs","component":"migrationmanager","migrator":"logs"}
n
Run this query and check if something is stuck
Copy code
select * from system.distributed_ddl_queue where status!='Finished';
h
Copy code
┌─entry────────────┬─entry_version─┬─initiator_host───────────────────────────────────────────────────────────────────────────────────┬─initiator_port─┬─cluster─┬─query─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─settings───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬───query_create_time─┬─host──────────────────────────────┬─port─┬─status───┬─exception_code─┬─exception_text─┬─query_finish_time─┬─query_duration_ms─┐
│ query-0000000559 │             5 │ chi-signoz-clickhouse-cluster-0-1-0.chi-signoz-clickhouse-cluster-0-1.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_traces UUID '0b19e66e-4dbd-4aef-b1d8-c499c315658b' ON CLUSTER cluster                │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:41:30 │ chi-signoz-clickhouse-cluster-0-0 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000559 │             5 │ chi-signoz-clickhouse-cluster-0-1-0.chi-signoz-clickhouse-cluster-0-1.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_traces UUID '0b19e66e-4dbd-4aef-b1d8-c499c315658b' ON CLUSTER cluster                │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:41:30 │ chi-signoz-clickhouse-cluster-0-1 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000558 │             5 │ chi-signoz-clickhouse-cluster-0-1-0.chi-signoz-clickhouse-cluster-0-1.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_traces UUID 'dfdf0738-30dc-4ec0-915b-718c9f819c4c' ON CLUSTER cluster                │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:40:00 │ chi-signoz-clickhouse-cluster-0-0 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000558 │             5 │ chi-signoz-clickhouse-cluster-0-1-0.chi-signoz-clickhouse-cluster-0-1.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_traces UUID 'dfdf0738-30dc-4ec0-915b-718c9f819c4c' ON CLUSTER cluster                │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:40:00 │ chi-signoz-clickhouse-cluster-0-1 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000557 │             5 │ chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_logs UUID '4e6f68a5-a258-426d-9ce9-7c202d4f5d46' ON CLUSTER cluster                  │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:39:25 │ chi-signoz-clickhouse-cluster-0-0 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000557 │             5 │ chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_logs UUID '4e6f68a5-a258-426d-9ce9-7c202d4f5d46' ON CLUSTER cluster                  │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:39:25 │ chi-signoz-clickhouse-cluster-0-1 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000556 │             5 │ chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_traces UUID 'e8a2d4c8-a7e6-4c6d-97e2-4b6ec9d153dc' ON CLUSTER cluster                │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:38:30 │ chi-signoz-clickhouse-cluster-0-0 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000556 │             5 │ chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_traces UUID 'e8a2d4c8-a7e6-4c6d-97e2-4b6ec9d153dc' ON CLUSTER cluster                │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:38:30 │ chi-signoz-clickhouse-cluster-0-1 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000555 │             5 │ chi-signoz-clickhouse-cluster-0-1-0.chi-signoz-clickhouse-cluster-0-1.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_traces UUID 'ec3d8cfd-c44c-49eb-90fa-c00f51fbfb74' ON CLUSTER cluster                │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:37:00 │ chi-signoz-clickhouse-cluster-0-0 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
Untitled
above is the full result of it
we already have the signoz_logs database, why migration is still trying to create one?
n
It has a
IF NOT EXISTS
clause so it won’t cause the issue. if you see the active ones, recent calls to modify retention(TTL) are active, let them finish ?
Copy code
│ query-0000000549 │             5 │ chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.platform.svc.cluster.local │           9000 │ cluster │ ALTER TABLE signoz_logs.logs ON CLUSTER cluster MODIFY TTL  toDateTime(timestamp / 1000000000) + toIntervalSecond(604800) │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','distributed_ddl_task_timeout':'-1','max_bytes_to_read':'2000000000000','max_execution_time':'600','max_execution_time_leaf':'600','timeout_before_checking_execution_speed':'5','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:27:32 │ chi-signoz-clickhouse-cluster-0-0 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000549 │             5 │ chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.platform.svc.cluster.local │           9000 │ cluster │ ALTER TABLE signoz_logs.logs ON CLUSTER cluster MODIFY TTL  toDateTime(timestamp / 1000000000) + toIntervalSecond(604800) │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','distributed_ddl_task_timeout':'-1','max_bytes_to_read':'2000000000000','max_execution_time':'600','max_execution_time_leaf':'600','timeout_before_checking_execution_speed':'5','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:27:32 │ chi-signoz-clickhouse-cluster-0-1 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000548 │             5 │ chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.platform.svc.cluster.local │           9000 │ cluster │ ALTER TABLE signoz_logs.logs ON CLUSTER cluster MODIFY TTL  toDateTime(timestamp / 1000000000) + toIntervalSecond(864000) │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','distributed_ddl_task_timeout':'-1','max_bytes_to_read':'2000000000000','max_execution_time':'600','max_execution_time_leaf':'600','timeout_before_checking_execution_speed':'5','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:09:14 │ chi-signoz-clickhouse-cluster-0-0 │ 9000 │ Active   │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000548 │             5 │ chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.platform.svc.cluster.local │           9000 │ cluster │ ALTER TABLE signoz_logs.logs ON CLUSTER cluster MODIFY TTL  toDateTime(timestamp / 1000000000) + toIntervalSecond(864000) │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','distributed_ddl_task_timeout':'-1','max_bytes_to_read':'2000000000000','max_execution_time':'600','max_execution_time_leaf':'600','timeout_before_checking_execution_speed':'5','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:09:14 │ chi-signoz-clickhouse-cluster-0-1 │ 9000 │ Active   │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
h
ok, but why the ttl change task will block the migration?
And the error log says the 554 one which is the create database. Does the UUID matters?
Copy code
│ query-0000000554 │             5 │ chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_logs UUID '5a5ce4f2-ec65-4a4f-9793-9850f59def3a' ON CLUSTER cluster                  │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:36:09 │ chi-signoz-clickhouse-cluster-0-1 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
n
Not totally sure here, @Srikanth Chekuri any idea why the create db might be blocked ?
h
we used to manually change the DB schema to replicated one which we have 2 signoz_logs database with 2 differnet UUID in 2 instances. is it related?
n
what did you manually change ?
h
Copy code
-- show create table logs to get the original schema and create a replicated_logs table from it
CREATE TABLE signoz_logs.replicated_logs
(
    `timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `observed_timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `id` String CODEC(ZSTD(1)),
    `trace_id` String CODEC(ZSTD(1)),
    `span_id` String CODEC(ZSTD(1)),
    `trace_flags` UInt32,
    `severity_text` LowCardinality(String) CODEC(ZSTD(1)),
    `severity_number` UInt8,
    `body` String CODEC(ZSTD(2)),
    `resources_string_key` Array(String) CODEC(ZSTD(1)),
    `resources_string_value` Array(String) CODEC(ZSTD(1)),
    `attributes_string_key` Array(String) CODEC(ZSTD(1)),
    `attributes_string_value` Array(String) CODEC(ZSTD(1)),
    `attributes_int64_key` Array(String) CODEC(ZSTD(1)),
    `attributes_int64_value` Array(Int64) CODEC(ZSTD(1)),
    `attributes_float64_key` Array(String) CODEC(ZSTD(1)),
    `attributes_float64_value` Array(Float64) CODEC(ZSTD(1)),
    `attributes_bool_key` Array(String) CODEC(ZSTD(1)),
    `attributes_bool_value` Array(Bool) CODEC(ZSTD(1)),
    INDEX body_idx body TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 4,
    INDEX id_minmax id TYPE minmax GRANULARITY 1,
    INDEX severity_number_idx severity_number TYPE set(25) GRANULARITY 4,
    INDEX severity_text_idx severity_text TYPE set(25) GRANULARITY 4,
    INDEX trace_flags_idx trace_flags TYPE bloom_filter GRANULARITY 4
)
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{cluster}/logs', '{replica}')
PARTITION BY toDate(timestamp / 1000000000)
ORDER BY (timestamp, id)
TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(432000)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1

-- Prepare for attach all shards
SELECT DISTINCT concat('ALTER TABLE replicated_logs ATTACH PARTITION ID \'', partition_id, '\' FROM logs;')
FROM system.parts
WHERE (table = 'logs') AND active;

-- Result
ALTER TABLE replicated_logs ATTACH PARTITION ID '20240706' FROM logs;
ALTER TABLE replicated_logs ATTACH PARTITION ID '20240707' FROM logs;
ALTER TABLE replicated_logs ATTACH PARTITION ID '20240708' FROM logs;
ALTER TABLE replicated_logs ATTACH PARTITION ID '20240709' FROM logs;
ALTER TABLE replicated_logs ATTACH PARTITION ID '20240710' FROM logs;
ALTER TABLE replicated_logs ATTACH PARTITION ID '20240711' FROM logs;

-- Check count for the source table
SELECT count(1) FROM logs;

-- Check count for the destination table
SELECT count(1) FROM replicated_logs;

-- Rename and redirect traffic
RENAME TABLE logs TO logs_old, replicated_logs TO logs;

-- Cleanup
DROP TABLE logs_old;
n
Nah doesn’t seem to be related as the error is on db creation statement.
h
Ok. I’m curious about where the UUID comes from during the migration. I checked the database UUID do not match what shows in the task queue
n
Try waiting for the TTL change to complete and then run the migrator again ? Apart from that I am out of ideas, let’s wait for Srikanth if he has something to say.
h
Also we jumped the
signoz/signoz-schema-migrator
image version from
0.102.2
to
0.102.10
. is it related?
@nitya-signoz I found the issue is related with
ON CLUSTER cluster;
I manually run the create database query with on cluster it just hangs. But if I removed the on cluster it can finished immediately
Fixed issue. I found there is some pending mutations from this list
select * from system.mutations;
The mutation is like
materialize ttl
on some old log partitions. Not sure why they stuck for very long time. I manually kill them through this
KILL MUTATION WHERE database = 'signoz_logs' AND table = 'logs' AND mutation_id IN ('<id1>', '<id2>');
and then
select * from system.distributed_ddl_queue where status!='Finished';
is empty now. Then I run the migration and all works as expected. cc @nitya-signoz
n
Ahh the materialize ttl was the cultprit.
h
Yeah. you better add this note in the upgrade guide page to help others as well 📣
@nitya-signoz Hi, after we run the data migration script. which tables are safe to truncate manually? We want to save disk space immediately.
Copy code
┌─name────────────────────────────┐
│ attribute_keys_bool_final_mv    │
│ attribute_keys_float64_final_mv │
│ attribute_keys_string_final_mv  │
│ distributed_logs                │
│ distributed_logs_attribute_keys │
│ distributed_logs_resource_keys  │
│ distributed_logs_v2             │
│ distributed_logs_v2_resource    │
│ distributed_tag_attributes      │
│ distributed_usage               │
│ logs                            │
│ logs_attribute_keys             │
│ logs_resource_keys              │
│ logs_v2                         │
│ logs_v2_resource                │
│ resource_keys_string_final_mv   │
│ schema_migrations               │
│ tag_attributes                  │
│ usage                           │
└─────────────────────────────────┘
also we met another problem after running the data migration script. we are still not be able to search for logs from the backfilled data time window. Is there any additional step we need to do to make these backfilled data visible?
the earliest timestamp from distributed_logs_v2 table is
1727734329218044625
which is 10/1, but we still can’t see logs in the SigNoz UI in the time window 10/2 - 10/7
n
you can truncate
logs
table.
Did you make this change for query-service ?
Copy code
query-service:
  command:
    [
      "-config=/root/config/prometheus.yml",
      "--use-logs-new-schema=true"
    ]
h
Yes. and strange thing is we can see logs from one of our cluster, but not for the rest of cluster logs. we have multiple EKS clusters that send logs to Signoz. currently the issue I found is in the
distributed_logs_v2_resource
table it’s missing most of our cluster’s log but only contain 1 cluster logs. not sure what’s causing this.
Can I just re-run the migrate data script?
Did you make this change for query-service ?
Yes I do.
n
Re running the migration script will cause duplicate data, first you will have to remove data for that timerange manually from logs_v2 table in each replica and then re run the migration script.
h
Ok. which command I can use to drop the specific date partitions?
n
Compare the count of data first to see if there is actually a difference,
Copy code
SELECT
    toStartOfInterval(fromUnixTimestamp64Nano(timestamp), toIntervalHour(1)) AS ts,
    toFloat64(count(*)) AS value
FROM signoz_logs.distributed_logs_v2
WHERE (timestamp >= toInt64(now64()-toIntervalHour(70)) * 1e9)
GROUP BY ts
ORDER BY value DESC



SELECT
    toStartOfInterval(fromUnixTimestamp64Nano(timestamp), toIntervalHour(1)) AS ts,
    toFloat64(count(*)) AS value
FROM signoz_logs.distributed_logs
WHERE (timestamp >= toInt64(now64()-toIntervalHour(70)) * 1e9)
GROUP BY ts
ORDER BY value DESC
Then you can follow this guide https://clickhouse.com/docs/en/sql-reference/statements/alter/partition, or delete using where clause https://clickhouse.com/docs/en/sql-reference/statements/delete
h
I compared these 2, the results not match at all..
n
do check the normal tables as well in each replica, so that if there is any issue with your setup you'll know
h
let me try delete partitions and retry for migrate data again to see how it works
n
Okay till then just keep the config for collector false, this will ensure that data is written to both tables
Copy code
exporters:
  clickhouselogsexporter:
    use_new_schema: false
h
I think I only need to delete old partitions and it won’t affect the new log write to new table right?
n
No it won’t affect, just as an extra precaution I suggested that.
h
👌🏻I changed to use new schema yesterday and new logs seems work as expected. I may not need to send logs to old table again.
n
Great 👍
h
I need to delete partitions on these 4 new tables right?
Copy code
distributed_logs_v2
logs_v2
logs_v2_resource
distributed_logs_v2_resource
n
only logs_v2 and logs_v2_resource is fine, but do it in all the replicas. The distributed tables doesn’t hold any data they just point to the tables.
We you able to migrate the data ?
h
Yes. It's still running as we have lots of logs need to copy