Hi team I saw the new version 0 55 include the log migration SigNoz Community #support

Hi team, I saw the new version 0.55 include the lo...

Herb He

09/26/2024, 2:26 AM

Hi team, I saw the new version 0.55 include the log migration is released and I have read the migration guide. But before I have confidence to upgrade it. Can someone help explain more details about what’s the log change that breaks the previous log table and how does the migration work to make sure we are smoothly upgrade to new way to avoid us loose any existing logs.

nitya-signoz

09/26/2024, 4:12 AM

So we are moving to new tables for logs, the migration moves the TTL settings and materialized columns if any from the old tables to the new table. Incase you are using custom retention settings, you are required to add some flags, read https://signoz-community.slack.com/archives/C01HWQ1R0BC/p1727323822909249?thread_ts=1727299733.995679&cid=C01HWQ1R0BC

Herb He

09/26/2024, 5:57 AM

I see. and how does it handle log search on both old and new log tables?

nitya-signoz

09/26/2024, 5:58 AM

If you are using clickhouse queries you will have to update them.

Herb He

09/26/2024, 5:58 AM

no. we are just using the UI signoz log search provided

nitya-signoz

09/26/2024, 5:59 AM

the new tables will make searches with resource attributes way faster then the old table, due to the new fingerprinting system we have.

Herb He

09/26/2024, 5:59 AM

if we set

--use-logs-new-schema=false

then signoz query service will try search logs on both old and new tables?

nitya-signoz

09/26/2024, 6:00 AM

no, then it will run the query on the old table.

nitya-signoz

09/26/2024, 6:00 AM

if it’s set to true then it will run it on the new table.

nitya-signoz

09/26/2024, 6:01 AM

when the config in otel collector is set to false https://github.com/SigNoz/signoz/blob/87499d1ead6042047ddc7111811813f907330ebf/deploy/docker/clickhouse-setup/otel-collector-config.yaml#L157 . It’s writes to both table. Once you set it to true it will just write to new table

Herb He

09/26/2024, 6:02 AM

I see. that make sense.

Herb He

09/26/2024, 6:03 AM

so the log volume stored in the Clickhouse will be doubled when it write to both tables right?

nitya-signoz

09/26/2024, 6:04 AM

Yeah for that duration the storage will be doubled, we will try to release a new script that will help users with very long retention to move their data from old table to the new one, but it’s in process.

Herb He

09/26/2024, 6:06 AM

I see. I think I need that script before I upgrade it as we have a very tight disk storage limit and don’t want to expand it too much. Is there a estimate time about when that script will be ready?

nitya-signoz

09/26/2024, 6:07 AM

Expect it by next week as worst case. Though it should be released faster

Herb He

09/26/2024, 6:09 AM

Thanks! Another question is we setup the Clickhouse with 2 replicas so they can fail over with each other. In this case I guess I have to run the migration scripts twice on both replicas right?

nitya-signoz

09/26/2024, 6:12 AM

No you don’t need to run the migration script in both replicas, clickhouse internally takes care of replication when you are using replicated merge tree.

nitya-signoz

09/26/2024, 6:12 AM

It is enabled by https://signoz-community.slack.com/archives/C01HWQ1R0BC/p1727242363380939?thread_ts=1727187494.770699&cid=C01HWQ1R0BC setting this flag.

Herb He

09/26/2024, 6:13 AM

Ok. so I just need to run the migration on one of the CH instance then another one will automatically pickup the change?

nitya-signoz

09/26/2024, 6:13 AM

correct.

Herb He

09/26/2024, 6:16 AM

what’s the difference between the helm schema migrator vs. run this kubectl manually? Are they doing the same thing or not?

nitya-signoz

09/26/2024, 6:17 AM

No, helm schema migrator is for adding the schemas, it will always be present. this migrator is something you just run once if required. (both have different purpose)

Herb He

09/26/2024, 6:18 AM

I see. Thanks for clarify this. I just wait for the new script to move old logs to new table before upgrade. I’m more confident on this now.

Herb He

10/07/2024, 2:44 AM

Hi @nitya-signoz , wants to check with you about if the new script to help us migrate old data is ready or not👀

nitya-signoz

10/07/2024, 12:27 PM

The docs for it are in progress you can have a look at it here https://github.com/SigNoz/signoz-web/pull/849 once that is merged, this page will be updated https://signoz.io/docs/operate/migration/upgrade-0.55

nitya-signoz

10/07/2024, 12:28 PM

Also just to clarify, you won’t be able to avoid the double storage issue as the script copies the data from old to the new table.

Herb He

10/07/2024, 1:28 PM

Thanks~ and since table is partitioned by day. how about copy 1 day data to new table -> delete 1 day data in old table. so the double storage issue will be limited to 1 day log data

nitya-signoz

10/07/2024, 2:49 PM

Yeah you can try that, here is the current code https://github.com/SigNoz/signoz-db-migrations/blob/main/migration-0.55-data/main.go you can tweak it for the above logic. For us we avoided as we wanted to keep the duplicate data regardless so we didn’t go that approach.

Herb He

10/08/2024, 6:42 AM

@nitya-signoz we get this schema upgrade error. can you help suggest us how to solve the issue?

Copy code

{"level":"info","timestamp":"2024-10-08T06:36:09.449Z","caller":"signozschemamigrator/migrate.go:89","msg":"Setting env var SIGNOZ_CLUSTER","component":"migrate cli","cluster-name":"cluster"}
{"level":"info","timestamp":"2024-10-08T06:36:09.449Z","caller":"signozschemamigrator/migrate.go:106","msg":"Successfully set env var SIGNOZ_CLUSTER ","component":"migrate cli","cluster-name":"cluster"}
{"level":"info","timestamp":"2024-10-08T06:36:09.449Z","caller":"signozschemamigrator/migrate.go:111","msg":"Setting env var SIGNOZ_REPLICATED","component":"migrate cli","replication":true}
{"level":"info","timestamp":"2024-10-08T06:36:09.460Z","caller":"migrationmanager/manager.go:76","msg":"Running migrations for all migrators","component":"migrationmanager"}
{"level":"info","timestamp":"2024-10-08T06:36:09.460Z","caller":"migrationmanager/manager.go:78","msg":"Running migrations for logs","component":"migrationmanager","migrator":"logs"}
{"level":"error","timestamp":"2024-10-08T06:39:10.463Z","caller":"migrationmanager/manager.go:81","msg":"Failed to run migrations for migrator","component":"migrationmanager","migrator":"logs","error":"failed to create database, err: code: 159, message: Watching task /clickhouse/signoz-clickhouse/task_queue/ddl/query-0000000554 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 2 unfinished hosts (0 of them are currently executing the task), they are going to execute the query in background","stacktrace":"<http://github.com/SigNoz/signoz-otel-collector/migrationmanager.(*MigrationManager).Migrate|github.com/SigNoz/signoz-otel-collector/migrationmanager.(*MigrationManager).Migrate>\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/migrationmanager/manager.go:81\nmain.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozschemamigrator/migrate.go:126\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.13/x64/src/runtime/proc.go:267"}
{"level":"fatal","timestamp":"2024-10-08T06:39:10.463Z","caller":"signozschemamigrator/migrate.go:128","msg":"Failed to run migrations","component":"migrate cli","error":"failed to create database, err: code: 159, message: Watching task /clickhouse/signoz-clickhouse/task_queue/ddl/query-0000000554 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 2 unfinished hosts (0 of them are currently executing the task), they are going to execute the query in background","stacktrace":"main.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozschemamigrator/migrate.go:128\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.13/x64/src/runtime/proc.go:267"}
{"level":"info","timestamp":"2024-10-08T06:39:25.333Z","caller":"signozschemamigrator/migrate.go:89","msg":"Setting env var SIGNOZ_CLUSTER","component":"migrate cli","cluster-name":"cluster"}
{"level":"info","timestamp":"2024-10-08T06:39:25.333Z","caller":"signozschemamigrator/migrate.go:106","msg":"Successfully set env var SIGNOZ_CLUSTER ","component":"migrate cli","cluster-name":"cluster"}
{"level":"info","timestamp":"2024-10-08T06:39:25.333Z","caller":"signozschemamigrator/migrate.go:111","msg":"Setting env var SIGNOZ_REPLICATED","component":"migrate cli","replication":true}
{"level":"info","timestamp":"2024-10-08T06:39:25.349Z","caller":"migrationmanager/manager.go:76","msg":"Running migrations for all migrators","component":"migrationmanager"}
{"level":"info","timestamp":"2024-10-08T06:39:25.350Z","caller":"migrationmanager/manager.go:78","msg":"Running migrations for logs","component":"migrationmanager","migrator":"logs"}

nitya-signoz

10/08/2024, 6:53 AM

Run this query and check if something is stuck

Copy code

select * from system.distributed_ddl_queue where status!='Finished';

Herb He

10/08/2024, 6:54 AM

Copy code

┌─entry────────────┬─entry_version─┬─initiator_host───────────────────────────────────────────────────────────────────────────────────┬─initiator_port─┬─cluster─┬─query─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─settings───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬───query_create_time─┬─host──────────────────────────────┬─port─┬─status───┬─exception_code─┬─exception_text─┬─query_finish_time─┬─query_duration_ms─┐
│ query-0000000559 │             5 │ chi-signoz-clickhouse-cluster-0-1-0.chi-signoz-clickhouse-cluster-0-1.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_traces UUID '0b19e66e-4dbd-4aef-b1d8-c499c315658b' ON CLUSTER cluster                │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:41:30 │ chi-signoz-clickhouse-cluster-0-0 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000559 │             5 │ chi-signoz-clickhouse-cluster-0-1-0.chi-signoz-clickhouse-cluster-0-1.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_traces UUID '0b19e66e-4dbd-4aef-b1d8-c499c315658b' ON CLUSTER cluster                │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:41:30 │ chi-signoz-clickhouse-cluster-0-1 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000558 │             5 │ chi-signoz-clickhouse-cluster-0-1-0.chi-signoz-clickhouse-cluster-0-1.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_traces UUID 'dfdf0738-30dc-4ec0-915b-718c9f819c4c' ON CLUSTER cluster                │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:40:00 │ chi-signoz-clickhouse-cluster-0-0 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000558 │             5 │ chi-signoz-clickhouse-cluster-0-1-0.chi-signoz-clickhouse-cluster-0-1.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_traces UUID 'dfdf0738-30dc-4ec0-915b-718c9f819c4c' ON CLUSTER cluster                │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:40:00 │ chi-signoz-clickhouse-cluster-0-1 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000557 │             5 │ chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_logs UUID '4e6f68a5-a258-426d-9ce9-7c202d4f5d46' ON CLUSTER cluster                  │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:39:25 │ chi-signoz-clickhouse-cluster-0-0 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000557 │             5 │ chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_logs UUID '4e6f68a5-a258-426d-9ce9-7c202d4f5d46' ON CLUSTER cluster                  │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:39:25 │ chi-signoz-clickhouse-cluster-0-1 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000556 │             5 │ chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_traces UUID 'e8a2d4c8-a7e6-4c6d-97e2-4b6ec9d153dc' ON CLUSTER cluster                │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:38:30 │ chi-signoz-clickhouse-cluster-0-0 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000556 │             5 │ chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_traces UUID 'e8a2d4c8-a7e6-4c6d-97e2-4b6ec9d153dc' ON CLUSTER cluster                │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:38:30 │ chi-signoz-clickhouse-cluster-0-1 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000555 │             5 │ chi-signoz-clickhouse-cluster-0-1-0.chi-signoz-clickhouse-cluster-0-1.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_traces UUID 'ec3d8cfd-c44c-49eb-90fa-c00f51fbfb74' ON CLUSTER cluster                │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:37:00 │ chi-signoz-clickhouse-cluster-0-0 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │

Herb He

10/08/2024, 6:55 AM

Untitled

Herb He

10/08/2024, 6:55 AM

above is the full result of it

Herb He

10/08/2024, 7:02 AM

we already have the signoz_logs database, why migration is still trying to create one?

nitya-signoz

10/08/2024, 7:10 AM

It has a

IF NOT EXISTS

clause so it won’t cause the issue. if you see the active ones, recent calls to modify retention(TTL) are active, let them finish ?

Copy code

│ query-0000000549 │             5 │ chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.platform.svc.cluster.local │           9000 │ cluster │ ALTER TABLE signoz_logs.logs ON CLUSTER cluster MODIFY TTL  toDateTime(timestamp / 1000000000) + toIntervalSecond(604800) │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','distributed_ddl_task_timeout':'-1','max_bytes_to_read':'2000000000000','max_execution_time':'600','max_execution_time_leaf':'600','timeout_before_checking_execution_speed':'5','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:27:32 │ chi-signoz-clickhouse-cluster-0-0 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000549 │             5 │ chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.platform.svc.cluster.local │           9000 │ cluster │ ALTER TABLE signoz_logs.logs ON CLUSTER cluster MODIFY TTL  toDateTime(timestamp / 1000000000) + toIntervalSecond(604800) │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','distributed_ddl_task_timeout':'-1','max_bytes_to_read':'2000000000000','max_execution_time':'600','max_execution_time_leaf':'600','timeout_before_checking_execution_speed':'5','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:27:32 │ chi-signoz-clickhouse-cluster-0-1 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000548 │             5 │ chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.platform.svc.cluster.local │           9000 │ cluster │ ALTER TABLE signoz_logs.logs ON CLUSTER cluster MODIFY TTL  toDateTime(timestamp / 1000000000) + toIntervalSecond(864000) │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','distributed_ddl_task_timeout':'-1','max_bytes_to_read':'2000000000000','max_execution_time':'600','max_execution_time_leaf':'600','timeout_before_checking_execution_speed':'5','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:09:14 │ chi-signoz-clickhouse-cluster-0-0 │ 9000 │ Active   │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │
│ query-0000000548 │             5 │ chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.platform.svc.cluster.local │           9000 │ cluster │ ALTER TABLE signoz_logs.logs ON CLUSTER cluster MODIFY TTL  toDateTime(timestamp / 1000000000) + toIntervalSecond(864000) │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','distributed_ddl_task_timeout':'-1','max_bytes_to_read':'2000000000000','max_execution_time':'600','max_execution_time_leaf':'600','timeout_before_checking_execution_speed':'5','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:09:14 │ chi-signoz-clickhouse-cluster-0-1 │ 9000 │ Active   │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │

Herb He

10/08/2024, 7:12 AM

ok, but why the ttl change task will block the migration?

Herb He

10/08/2024, 7:14 AM

And the error log says the 554 one which is the create database. Does the UUID matters?

Copy code

│ query-0000000554 │             5 │ chi-signoz-clickhouse-cluster-0-0-0.chi-signoz-clickhouse-cluster-0-0.platform.svc.cluster.local │           9000 │ cluster │ CREATE DATABASE IF NOT EXISTS signoz_logs UUID '5a5ce4f2-ec65-4a4f-9793-9850f59def3a' ON CLUSTER cluster                  │ {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','max_table_size_to_drop':'0','max_partition_size_to_drop':'0','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'} │ 2024-10-08 06:36:09 │ chi-signoz-clickhouse-cluster-0-1 │ 9000 │ Inactive │           ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ           │              ᴺᵁᴸᴸ │              ᴺᵁᴸᴸ │

nitya-signoz

10/08/2024, 7:15 AM

Not totally sure here, @Srikanth Chekuri any idea why the create db might be blocked ?

Herb He

10/08/2024, 7:15 AM

we used to manually change the DB schema to replicated one which we have 2 signoz_logs database with 2 differnet UUID in 2 instances. is it related?

nitya-signoz

10/08/2024, 7:21 AM

what did you manually change ?

Herb He

10/08/2024, 7:22 AM

Copy code

-- show create table logs to get the original schema and create a replicated_logs table from it
CREATE TABLE signoz_logs.replicated_logs
(
    `timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `observed_timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `id` String CODEC(ZSTD(1)),
    `trace_id` String CODEC(ZSTD(1)),
    `span_id` String CODEC(ZSTD(1)),
    `trace_flags` UInt32,
    `severity_text` LowCardinality(String) CODEC(ZSTD(1)),
    `severity_number` UInt8,
    `body` String CODEC(ZSTD(2)),
    `resources_string_key` Array(String) CODEC(ZSTD(1)),
    `resources_string_value` Array(String) CODEC(ZSTD(1)),
    `attributes_string_key` Array(String) CODEC(ZSTD(1)),
    `attributes_string_value` Array(String) CODEC(ZSTD(1)),
    `attributes_int64_key` Array(String) CODEC(ZSTD(1)),
    `attributes_int64_value` Array(Int64) CODEC(ZSTD(1)),
    `attributes_float64_key` Array(String) CODEC(ZSTD(1)),
    `attributes_float64_value` Array(Float64) CODEC(ZSTD(1)),
    `attributes_bool_key` Array(String) CODEC(ZSTD(1)),
    `attributes_bool_value` Array(Bool) CODEC(ZSTD(1)),
    INDEX body_idx body TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 4,
    INDEX id_minmax id TYPE minmax GRANULARITY 1,
    INDEX severity_number_idx severity_number TYPE set(25) GRANULARITY 4,
    INDEX severity_text_idx severity_text TYPE set(25) GRANULARITY 4,
    INDEX trace_flags_idx trace_flags TYPE bloom_filter GRANULARITY 4
)
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{cluster}/logs', '{replica}')
PARTITION BY toDate(timestamp / 1000000000)
ORDER BY (timestamp, id)
TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(432000)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1

-- Prepare for attach all shards
SELECT DISTINCT concat('ALTER TABLE replicated_logs ATTACH PARTITION ID \'', partition_id, '\' FROM logs;')
FROM system.parts
WHERE (table = 'logs') AND active;

-- Result
ALTER TABLE replicated_logs ATTACH PARTITION ID '20240706' FROM logs;
ALTER TABLE replicated_logs ATTACH PARTITION ID '20240707' FROM logs;
ALTER TABLE replicated_logs ATTACH PARTITION ID '20240708' FROM logs;
ALTER TABLE replicated_logs ATTACH PARTITION ID '20240709' FROM logs;
ALTER TABLE replicated_logs ATTACH PARTITION ID '20240710' FROM logs;
ALTER TABLE replicated_logs ATTACH PARTITION ID '20240711' FROM logs;

-- Check count for the source table
SELECT count(1) FROM logs;

-- Check count for the destination table
SELECT count(1) FROM replicated_logs;

-- Rename and redirect traffic
RENAME TABLE logs TO logs_old, replicated_logs TO logs;

-- Cleanup
DROP TABLE logs_old;

nitya-signoz

10/08/2024, 7:24 AM

Nah doesn’t seem to be related as the error is on db creation statement.

Herb He

10/08/2024, 7:25 AM

Ok. I’m curious about where the UUID comes from during the migration. I checked the database UUID do not match what shows in the task queue

nitya-signoz

10/08/2024, 7:27 AM

Try waiting for the TTL change to complete and then run the migrator again ? Apart from that I am out of ideas, let’s wait for Srikanth if he has something to say.

Herb He

10/08/2024, 7:28 AM

Also we jumped the

signoz/signoz-schema-migrator

image version from

0.102.2

0.102.10

. is it related?

Herb He

10/08/2024, 8:06 AM

@nitya-signoz I found the issue is related with

ON CLUSTER cluster;

I manually run the create database query with on cluster it just hangs. But if I removed the on cluster it can finished immediately

Herb He

10/08/2024, 11:38 AM

Fixed issue. I found there is some pending mutations from this list

select * from system.mutations;

The mutation is like

materialize ttl

on some old log partitions. Not sure why they stuck for very long time. I manually kill them through this

KILL MUTATION WHERE database = 'signoz_logs' AND table = 'logs' AND mutation_id IN ('<id1>', '<id2>');

and then

select * from system.distributed_ddl_queue where status!='Finished';

is empty now. Then I run the migration and all works as expected. cc @nitya-signoz

nitya-signoz

10/08/2024, 11:40 AM

Ahh the materialize ttl was the cultprit.

Herb He

10/08/2024, 11:41 AM

Yeah. you better add this note in the upgrade guide page to help others as well 📣

Herb He

10/09/2024, 3:06 AM

@nitya-signoz Hi, after we run the data migration script. which tables are safe to truncate manually? We want to save disk space immediately.

Copy code

┌─name────────────────────────────┐
│ attribute_keys_bool_final_mv    │
│ attribute_keys_float64_final_mv │
│ attribute_keys_string_final_mv  │
│ distributed_logs                │
│ distributed_logs_attribute_keys │
│ distributed_logs_resource_keys  │
│ distributed_logs_v2             │
│ distributed_logs_v2_resource    │
│ distributed_tag_attributes      │
│ distributed_usage               │
│ logs                            │
│ logs_attribute_keys             │
│ logs_resource_keys              │
│ logs_v2                         │
│ logs_v2_resource                │
│ resource_keys_string_final_mv   │
│ schema_migrations               │
│ tag_attributes                  │
│ usage                           │
└─────────────────────────────────┘

Herb He

10/09/2024, 3:55 AM

also we met another problem after running the data migration script. we are still not be able to search for logs from the backfilled data time window. Is there any additional step we need to do to make these backfilled data visible?

Herb He

10/09/2024, 3:57 AM

the earliest timestamp from distributed_logs_v2 table is

1727734329218044625

which is 10/1, but we still can’t see logs in the SigNoz UI in the time window 10/2 - 10/7

nitya-signoz

10/09/2024, 4:25 AM

you can truncate

logs

table.

nitya-signoz

10/09/2024, 4:28 AM

Did you make this change for query-service ?

Copy code

query-service:
  command:
    [
      "-config=/root/config/prometheus.yml",
      "--use-logs-new-schema=true"
    ]

Herb He

10/09/2024, 4:30 AM

Yes. and strange thing is we can see logs from one of our cluster, but not for the rest of cluster logs. we have multiple EKS clusters that send logs to Signoz. currently the issue I found is in the

distributed_logs_v2_resource

table it’s missing most of our cluster’s log but only contain 1 cluster logs. not sure what’s causing this.

Herb He

10/09/2024, 4:30 AM

Can I just re-run the migrate data script?

Herb He

10/09/2024, 4:31 AM

Did you make this change for query-service ?

Yes I do.

nitya-signoz

10/09/2024, 4:32 AM

Re running the migration script will cause duplicate data, first you will have to remove data for that timerange manually from logs_v2 table in each replica and then re run the migration script.

Herb He

10/09/2024, 4:34 AM

Ok. which command I can use to drop the specific date partitions?

nitya-signoz

10/09/2024, 4:40 AM

Compare the count of data first to see if there is actually a difference,

Copy code

SELECT
    toStartOfInterval(fromUnixTimestamp64Nano(timestamp), toIntervalHour(1)) AS ts,
    toFloat64(count(*)) AS value
FROM signoz_logs.distributed_logs_v2
WHERE (timestamp >= toInt64(now64()-toIntervalHour(70)) * 1e9)
GROUP BY ts
ORDER BY value DESC



SELECT
    toStartOfInterval(fromUnixTimestamp64Nano(timestamp), toIntervalHour(1)) AS ts,
    toFloat64(count(*)) AS value
FROM signoz_logs.distributed_logs
WHERE (timestamp >= toInt64(now64()-toIntervalHour(70)) * 1e9)
GROUP BY ts
ORDER BY value DESC

Then you can follow this guide https://clickhouse.com/docs/en/sql-reference/statements/alter/partition, or delete using where clause https://clickhouse.com/docs/en/sql-reference/statements/delete

Herb He

10/09/2024, 4:43 AM

I compared these 2, the results not match at all..

nitya-signoz

10/09/2024, 4:45 AM

do check the normal tables as well in each replica, so that if there is any issue with your setup you'll know

Herb He

10/09/2024, 4:49 AM

let me try delete partitions and retry for migrate data again to see how it works

nitya-signoz

10/09/2024, 4:50 AM

Okay till then just keep the config for collector false, this will ensure that data is written to both tables

Copy code

exporters:
  clickhouselogsexporter:
    use_new_schema: false

Herb He

10/09/2024, 4:51 AM

I think I only need to delete old partitions and it won’t affect the new log write to new table right?

nitya-signoz

10/09/2024, 4:52 AM

No it won’t affect, just as an extra precaution I suggested that.

Herb He

10/09/2024, 4:54 AM

👌🏻I changed to use new schema yesterday and new logs seems work as expected. I may not need to send logs to old table again.

nitya-signoz

10/09/2024, 4:54 AM

Great 👍

Herb He

10/09/2024, 4:55 AM

I need to delete partitions on these 4 new tables right?

Copy code

distributed_logs_v2
logs_v2
logs_v2_resource
distributed_logs_v2_resource

nitya-signoz

10/09/2024, 5:20 AM

only logs_v2 and logs_v2_resource is fine, but do it in all the replicas. The distributed tables doesn’t hold any data they just point to the tables.

nitya-signoz

10/10/2024, 5:16 AM

We you able to migrate the data ?

Herb He

10/10/2024, 5:18 AM

Yes. It's still running as we have lots of logs need to copy

35 Views

Open in Slack

Previous Next