Upgrading to 0.49.0 today but it has failed the mi...
# support
p
Upgrading to 0.49.0 today but it has failed the migration and i see logs in clickhouse
Copy code
2024.07.04 00:52:35.251054 [ 3193 ] {57501799-65c5-4d3f-8f1b-b9d1ba632b6c} <Error> TCPHandler: Code: 16. DB::Exception: No such column instrumentation_scope in table signoz_logs.distributed_logs (3d4317c5-4815-4082-8697-61476827280e). (NO_SUCH_COLUMN_IN_TABLE), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c800f1b in /usr/bin/clickhouse
1. DB::Exception::Exception<String const&, String>(int, FormatStringHelperImpl<std::type_identity<String const&>::type, std::type_identity<String>::type>, String const&, String&&) @ 0x0000000007c0c81d in /usr/bin/clickhouse
2. DB::InterpreterInsertQuery::getSampleBlock(std::vector<String, std::allocator<String>> const&, std::shared_ptr<DB::IStorage> const&, std::shared_ptr<DB::StorageInMemoryMetadata const> const&) const @ 0x00000000114bc517 in /usr/bin/clickhouse
3. DB::InterpreterInsertQuery::getSampleBlock(DB::ASTInsertQuery const&, std::shared_ptr<DB::IStorage> const&, std::shared_ptr<DB::StorageInMemoryMetadata const> const&) const @ 0x00000000114bbb77 in /usr/bin/clickhouse
4. DB::InterpreterInsertQuery::execute() @ 0x00000000114bf81c in /usr/bin/clickhouse
5. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) @ 0x0000000011904974 in /usr/bin/clickhouse
6. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum) @ 0x00000000118ff77a in /usr/bin/clickhouse
7. DB::TCPHandler::runImpl() @ 0x000000001291be29 in /usr/bin/clickhouse
8. DB::TCPHandler::run() @ 0x0000000012933eb9 in /usr/bin/clickhouse
9. Poco::Net::TCPServerConnection::start() @ 0x00000000153a5a72 in /usr/bin/clickhouse
10. Poco::Net::TCPServerDispatcher::run() @ 0x00000000153a6871 in /usr/bin/clickhouse
11. Poco::PooledThread::run() @ 0x000000001549f047 in /usr/bin/clickhouse
12. Poco::ThreadImpl::runnableEntry(void*) @ 0x000000001549d67d in /usr/bin/clickhouse
13. ? @ 0x00007feeebbf9609
14. ? @ 0x00007feeebb1e353
2024-07-04T00:52:35.251273798Z
s
If this is not fixed already, 1. Exec into clickhouse and run
DROP TABLE signoz_logs.schema_migrations ON CLUSTER cluster
2. Restart the migrator and collector.
p
sure will try and let u know
how do you connect to clickhouse cluster using clickhouse-client ?
we are using kubernetes
s
Yes, exec into pod and run
clickhouse client
p
Copy code
chi-signozagent-clickhouse-cluster-1-0-0.chi-signozagent-clickhouse-cluster-1-0.signoz.svc.cluster.local :) SELECT * FROM signoz_logs.schema_migration

SELECT *
FROM signoz_logs.schema_migration

Query id: 090576e9-635c-480e-bcc3-781ed7cfb354


Elapsed: 0.030 sec. 

Received exception from server (version 24.1.2):
Code: 60. DB::Exception: Received from localhost:9000. DB::Exception: Table signoz_logs.schema_migration does not exist. Maybe you meant signoz_logs.schema_migrations?. (UNKNOWN_TABLE)

chi-signozagent-clickhouse-cluster-1-0-0.chi-signozagent-clickhouse-cluster-1-0.signoz.svc.cluster.local :)
i tried reading and table doesn't exist
i did run the above DROP command before this
s
You have a typo. Please use the command I shared.
Ok, I see
Now try upgrading.
p
Copy code
chi-signozagent-clickhouse-cluster-1-0-0.chi-signozagent-clickhouse-cluster-1-0.signoz.svc.cluster.local :) SELECT * FROM signoz_logs.schema_migrations

SELECT *
FROM signoz_logs.schema_migrations

Query id: 7f720bc7-2b66-447c-8663-6ee22283f6e1

Ok.

0 rows in set. Elapsed: 0.002 sec. 

chi-signozagent-clickhouse-cluster-1-0-0.chi-signozagent-clickhouse-cluster-1-0.signoz.svc.cluster.local :)
s
Please try upgrade to the latest 0.49.1 version.
p
s
yes
p
but version in commit is 0.45.1 ?
s
That is chart version. The main signoz release is 0.49.1
p
Copy code
"level":"info","timestamp":"2024-07-05T07:17:44.804Z","caller":"signozschemamigrator/migrate.go:89","msg":"Setting env var SIGNOZ_CLUSTER","component":"migrate cli","cluster-name":"cluster"}
{"level":"info","timestamp":"2024-07-05T07:17:44.804Z","caller":"signozschemamigrator/migrate.go:106","msg":"Successfully set env var SIGNOZ_CLUSTER ","component":"migrate cli","cluster-name":"cluster"}
{"level":"info","timestamp":"2024-07-05T07:17:44.805Z","caller":"signozschemamigrator/migrate.go:111","msg":"Setting env var SIGNOZ_REPLICATED","component":"migrate cli","replication":false}
{"level":"info","timestamp":"2024-07-05T07:17:45.025Z","caller":"migrationmanager/manager.go:76","msg":"Running migrations for all migrators","component":"migrationmanager"}
{"level":"info","timestamp":"2024-07-05T07:17:45.025Z","caller":"migrationmanager/manager.go:78","msg":"Running migrations for logs","component":"migrationmanager","migrator":"logs"}
{"level":"error","timestamp":"2024-07-05T07:20:45.245Z","caller":"migrationmanager/manager.go:81","msg":"Failed to run migrations for migrator","component":"migrationmanager","migrator":"logs","error":"failed to create database, err: code: 159, message: Watching task /clickhouse/signozagent-clickhouse/task_queue/ddl/query-0000000533 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 3 unfinished hosts (0 of them are currently executing the task), they are going to execute the query in background","stacktrace":"<http://github.com/SigNoz/signoz-otel-collector/migrationmanager.(*MigrationManager).Migrate|github.com/SigNoz/signoz-otel-collector/migrationmanager.(*MigrationManager).Migrate>\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/migrationmanager/manager.go:81\nmain.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozschemamigrator/migrate.go:126\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.11/x64/src/runtime/proc.go:267"}
{"level":"fatal","timestamp":"2024-07-05T07:20:45.245Z","caller":"signozschemamigrator/migrate.go:128","msg":"Failed to run migrations","component":"migrate cli","error":"failed to create database, err: code: 159, message: Watching task /clickhouse/signozagent-clickhouse/task_queue/ddl/query-0000000533 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 3 unfinished hosts (0 of them are currently executing the task), they are going to execute the query in background","stacktrace":"main.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozschemamigrator/migrate.go:128\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.11/x64/src/runtime/proc.go:267"}
{"level":"info","timestamp":"2024-07-05T07:21:11.808Z","caller":"signozschemamigrator/migrate.go:89","msg":"Setting env var SIGNOZ_CLUSTER","component":"migrate cli","cluster-name":"cluster"}
{"level":"info","timestamp":"2024-07-05T07:21:11.808Z","caller":"signozschemamigrator/migrate.go:106","msg":"Successfully set env var SIGNOZ_CLUSTER ","component":"migrate cli","cluster-name":"cluster"}
{"level":"info","timestamp":"2024-07-05T07:21:11.808Z","caller":"signozschemamigrator/migrate.go:111","msg":"Setting env var SIGNOZ_REPLICATED","component":"migrate cli","replication":false}
{"level":"info","timestamp":"2024-07-05T07:21:11.935Z","caller":"migrationmanager/manager.go:76","msg":"Running migrations for all migrators","component":"migrationmanager"}
{"level":"info","timestamp":"2024-07-05T07:21:11.935Z","caller":"migrationmanager/manager.go:78","msg":"Running migrations for logs","component":"migrationmanager","migrator":"logs"}
{"level":"error","timestamp":"2024-07-05T07:24:12.342Z","caller":"migrationmanager/manager.go:81","msg":"Failed to run migrations for migrator","component":"migrationmanager","migrator":"logs","error":"failed to create database, err: code: 159, message: Watching task /clickhouse/signozagent-clickhouse/task_queue/ddl/query-0000000534 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 3 unfinished hosts (0 of them are currently executing the task), they are going to execute the query in background","stacktrace":"<http://github.com/SigNoz/signoz-otel-collector/migrationmanager.(*MigrationManager).Migrate|github.com/SigNoz/signoz-otel-collector/migrationmanager.(*MigrationManager).Migrate>\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/migrationmanager/manager.go:81\nmain.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozschemamigrator/migrate.go:126\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.11/x64/src/runtime/proc.go:267"}
{"level":"fatal","timestamp":"2024-07-05T07:24:12.342Z","caller":"signozschemamigrator/migrate.go:128","msg":"Failed to run migrations","component":"migrate cli","error":"failed to create database, err: code: 159, message: Watching task /clickhouse/signozagent-clickhouse/task_queue/ddl/query-0000000534 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 3 unfinished hosts (0 of them are currently executing the task), they are going to execute the query in background","stacktrace":"main.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozschemamigrator/migrate.go:128\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.11/x64/src/runtime/proc.go:267"}
{"level":"info","timestamp":"2024-07-05T07:24:52.787Z","caller":"signozschemamigrator/migrate.go:89","msg":"Setting env var SIGNOZ_CLUSTER","component":"migrate cli","cluster-name":"cluster"}
{"level":"info","timestamp":"2024-07-05T07:24:52.787Z","caller":"signozschemamigrator/migrate.go:106","msg":"Successfully set env var SIGNOZ_CLUSTER ","component":"migrate cli","cluster-name":"cluster"}
{"level":"info","timestamp":"2024-07-05T07:24:52.787Z","caller":"signozschemamigrator/migrate.go:111","msg":"Setting env var SIGNOZ_REPLICATED","component":"migrate cli","replication":false}
{"level":"info","timestamp":"2024-07-05T07:24:52.836Z","caller":"migrationmanager/manager.go:76","msg":"Running migrations for all migrators","component":"migrationmanager"}
{"level":"info","timestamp":"2024-07-05T07:24:52.836Z","caller":"migrationmanager/manager.go:78","msg":"Running migrations for logs","component":"migrationmanager","migrator":"logs"}
schema migrator has failed and restarted
s
Can you share the output of
SELECT * from system.mutations WHERE is_done= 0
`Can you share the output of
p
Copy code
chi-signozagent-clickhouse-cluster-0-1-0.chi-signozagent-clickhouse-cluster-0-1.signoz.svc.cluster.local :) SELECT * from system.mutations WHERE is_done= 0

SELECT *
FROM system.mutations
WHERE is_done = 0

Query id: afeb9ed7-6cab-4edd-b764-0e069cee1ede

Ok.

0 rows in set. Elapsed: 0.010 sec. 

chi-signozagent-clickhouse-cluster-0-1-0.chi-signozagent-clickhouse-cluster-0-1.signoz.svc.cluster.local :)
s
What are current resource usages where ClickHouse is running?
p
Screenshot 2024-07-05 at 1.15.26 PM.png
s
Can you share the ouput of
Copy code
SELECT *
FROM system.distributed_ddl_queue
WHERE cluster = 'cluster'
FORMAT Vertical
p
there are 404 rows
u need all ?
s
No, wait. Let me share another one.
Please use this
Copy code
SELECT *
FROM system.distributed_ddl_queue
WHERE cluster = 'cluster' AND status != 'Finished'
FORMAT Vertical
p
Copy code
Row 1:
──────
entry:             query-0000000538
entry_version:     5
initiator_host:    chi-signozagent-clickhouse-cluster-0-0-0.chi-signozagent-clickhouse-cluster-0-0.signoz.svc.cluster.local
initiator_port:    9000
cluster:           cluster
query:             CREATE DATABASE IF NOT EXISTS signoz_logs UUID 'd11d6fa4-3c98-4bfa-9b82-c7bf40c1f869' ON CLUSTER cluster
settings:          {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'}
query_create_time: 2024-07-05 07:44:03
host:              chi-signozagent-clickhouse-cluster-1-1
port:              9000
status:            Inactive
exception_code:    ᴺᵁᴸᴸ
exception_text:    ᴺᵁᴸᴸ
query_finish_time: ᴺᵁᴸᴸ
query_duration_ms: ᴺᵁᴸᴸ

Row 2:
──────
entry:             query-0000000537
entry_version:     5
initiator_host:    chi-signozagent-clickhouse-cluster-1-0-0.chi-signozagent-clickhouse-cluster-1-0.signoz.svc.cluster.local
initiator_port:    9000
cluster:           cluster
query:             CREATE DATABASE IF NOT EXISTS signoz_logs UUID '959ea22e-5fe0-4062-92d6-29ddef344b6f' ON CLUSTER cluster
settings:          {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'}
query_create_time: 2024-07-05 07:35:02
host:              chi-signozagent-clickhouse-cluster-1-1
port:              9000
status:            Inactive
exception_code:    ᴺᵁᴸᴸ
exception_text:    ᴺᵁᴸᴸ
query_finish_time: ᴺᵁᴸᴸ
query_duration_ms: ᴺᵁᴸᴸ

Row 3:
──────
entry:             query-0000000536
entry_version:     5
initiator_host:    chi-signozagent-clickhouse-cluster-1-1-0.chi-signozagent-clickhouse-cluster-1-1.signoz.svc.cluster.local
initiator_port:    9000
cluster:           cluster
query:             CREATE DATABASE IF NOT EXISTS signoz_logs UUID '3ea7c647-57bd-4b63-87b7-686e74b4d6a5' ON CLUSTER cluster
settings:          {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'}
query_create_time: 2024-07-05 07:29:16
host:              chi-signozagent-clickhouse-cluster-1-1
port:              9000
status:            Inactive
exception_code:    ᴺᵁᴸᴸ
exception_text:    ᴺᵁᴸᴸ
query_finish_time: ᴺᵁᴸᴸ
query_duration_ms: ᴺᵁᴸᴸ

Row 4:
──────
entry:             query-0000000535
entry_version:     5
initiator_host:    chi-signozagent-clickhouse-cluster-1-0-0.chi-signozagent-clickhouse-cluster-1-0.signoz.svc.cluster.local
initiator_port:    9000
cluster:           cluster
query:             CREATE DATABASE IF NOT EXISTS signoz_logs UUID '960df61b-7317-4b6c-8297-c405ccab8e7c' ON CLUSTER cluster
settings:          {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'}
query_create_time: 2024-07-05 07:24:52
host:              chi-signozagent-clickhouse-cluster-1-1
port:              9000
status:            Inactive
exception_code:    ᴺᵁᴸᴸ
exception_text:    ᴺᵁᴸᴸ
query_finish_time: ᴺᵁᴸᴸ
query_duration_ms: ᴺᵁᴸᴸ

Row 5:
──────
entry:             query-0000000534
entry_version:     5
initiator_host:    chi-signozagent-clickhouse-cluster-0-0-0.chi-signozagent-clickhouse-cluster-0-0.signoz.svc.cluster.local
initiator_port:    9000
cluster:           cluster
query:             CREATE DATABASE IF NOT EXISTS signoz_logs UUID '78d95fa6-451a-43a7-91d9-a33cef27aea6' ON CLUSTER cluster
settings:          {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'}
query_create_time: 2024-07-05 07:21:11
host:              chi-signozagent-clickhouse-cluster-1-1
port:              9000
status:            Inactive
exception_code:    ᴺᵁᴸᴸ
exception_text:    ᴺᵁᴸᴸ
query_finish_time: ᴺᵁᴸᴸ
query_duration_ms: ᴺᵁᴸᴸ

Row 6:
──────
entry:             query-0000000533
entry_version:     5
initiator_host:    chi-signozagent-clickhouse-cluster-0-1-0.chi-signozagent-clickhouse-cluster-0-1.signoz.svc.cluster.local
initiator_port:    9000
cluster:           cluster
query:             CREATE DATABASE IF NOT EXISTS signoz_logs UUID '4a750053-3886-4d94-97fb-fd3f52ab86a5' ON CLUSTER cluster
settings:          {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'}
query_create_time: 2024-07-05 07:17:45
host:              chi-signozagent-clickhouse-cluster-1-1
port:              9000
status:            Inactive
exception_code:    ᴺᵁᴸᴸ
exception_text:    ᴺᵁᴸᴸ
query_finish_time: ᴺᵁᴸᴸ
query_duration_ms: ᴺᵁᴸᴸ

Row 7:
──────
entry:             query-0000000532
entry_version:     5
initiator_host:    chi-signozagent-clickhouse-cluster-1-0-0.chi-signozagent-clickhouse-cluster-1-0.signoz.svc.cluster.local
initiator_port:    9000
cluster:           cluster
query:             CREATE DATABASE IF NOT EXISTS signoz_logs UUID '5b876555-bf91-4bc8-b19a-7d45d7eae5ed' ON CLUSTER cluster
settings:          {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'}
query_create_time: 2024-07-05 07:14:31
host:              chi-signozagent-clickhouse-cluster-1-1
port:              9000
status:            Inactive
exception_code:    ᴺᵁᴸᴸ
exception_text:    ᴺᵁᴸᴸ
query_finish_time: ᴺᵁᴸᴸ
query_duration_ms: ᴺᵁᴸᴸ

Row 8:
──────
entry:             query-0000000531
entry_version:     5
initiator_host:    chi-signozagent-clickhouse-cluster-1-1-0.chi-signozagent-clickhouse-cluster-1-1.signoz.svc.cluster.local
initiator_port:    9000
cluster:           cluster
query:             ALTER TABLE signoz_logs.logs ON CLUSTER cluster DROP INDEX IF EXISTS body_idx
settings:          {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'}
query_create_time: 2024-07-05 07:11:29
host:              chi-signozagent-clickhouse-cluster-1-1
port:              9000
status:            Active
exception_code:    ᴺᵁᴸᴸ
exception_text:    ᴺᵁᴸᴸ
query_finish_time: ᴺᵁᴸᴸ
query_duration_ms: ᴺᵁᴸᴸ

Row 9:
───────
entry:             query-0000000539
entry_version:     5
initiator_host:    chi-signozagent-clickhouse-cluster-1-1-0.chi-signozagent-clickhouse-cluster-1-1.signoz.svc.cluster.local
initiator_port:    9000
cluster:           cluster
query:             CREATE DATABASE IF NOT EXISTS signoz_logs UUID 'e749fedb-e48d-4002-9376-8366dbb82cb0' ON CLUSTER cluster
settings:          {'connect_timeout_with_failover_ms':'1000','distributed_aggregation_memory_efficient':'1','log_queries':'1','parallel_view_processing':'1','allow_nondeterministic_mutations':'1','allow_experimental_window_functions':'1','default_database_engine':'Ordinary'}
query_create_time: 2024-07-05 07:47:04
host:              chi-signozagent-clickhouse-cluster-1-1
port:              9000
status:            Inactive
exception_code:    ᴺᵁᴸᴸ
exception_text:    ᴺᵁᴸᴸ
query_finish_time: ᴺᵁᴸᴸ
query_duration_ms: ᴺᵁᴸᴸ

9 rows in set. Elapsed: 2.245 sec.
s
One of you replicas doesn't seems to be running the ddl query. Can you exec into this one
chi-signozagent-clickhouse-cluster-1-1
? and check
Copy code
SELECT *
FROM system.mutations
WHERE is_done = 0
p
Copy code
┌─database────┬─table─┬─mutation_id─────────┬─command───────────────────────┬─────────create_time─┬─block_numbers.partition_id─┬─block_numbers.number─┬─parts_to_do_names──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─parts_to_do─┬─is_done─┬─is_killed─┬─latest_failed_part─┬────latest_fail_time─┬─latest_fail_reason─┐
│ signoz_logs │ logs  │ mutation_950358.txt │ DROP INDEX IF EXISTS body_idx │ 2024-07-05 07:11:29 │ ['']                       │ [950358]             │ ['20240605_1_11851_2524_949715','20240605_11852_12372_104_949715','20240605_128828_128828_0_949715','20240606_12373_33754_3739_949715','20240606_33755_36418_1357_949715','20240606_128829_128829_0_949715','20240607_36419_128830_2729_949715','20240608_77005_105614_3022_949715','20240608_128831_128831_0_949715','20240609_105615_135720_3996_949715','20240609_135721_135727_2_949715','20240610_135728_175266_2641_949715','20240610_175267_175267_0_949715','20240611_175268_194541_3114_949715','20240611_194542_204207_3558_949715','20240611_204208_430758_2_949715','20240612_204209_247532_2539_949715','20240612_247533_430759_2_949715','20240613_247534_388257_1598_949715','20240613_388307_430760_2_949715','20240614_295609_321249_425_949715','20240614_321250_430761_925_949715','20240615_344131_388631_864_949715','20240615_425326_430762_1_949715','20240616_374857_394891_612_949715','20240616_394892_402286_373_949715','20240616_402287_430763_3_949715','20240617_402291_518486_563_949715','20240618_437469_458705_665_949715','20240618_458706_518487_1528_949715','20240619_485909_510428_1249_949715','20240619_510429_523897_803_949715','20240619_523898_714310_113_949715','20240620_518963_712833_40_949715','20240620_713526_713526_0_949715','20240621_518959_533100_1792_949715','20240621_533101_712355_113_949715','20240621_712370_734561_12_949715','20240622_533108_549161_2532_949715','20240622_549162_734553_123_949715','20240622_734562_734562_0_949715','20240623_549170_734500_2426_949715','20240623_734539_734563_1_949715','20240624_566786_734564_2426_949715','20240625_593134_705454_1587_949715','20240625_705460_734556_108_949715','20240625_734565_734565_0_949715','20240626_617658_713722_1813_949715','20240626_713730_734566_13_949715','20240627_651339_734536_1362_949715','20240627_734543_734647_13_949715','20240628_734648_748379_419_949715','20240628_748380_769347_255_949715','20240628_769348_769357_2_949715','20240629_769358_783613_399_949715','20240629_783614_795385_352_949715','20240629_795386_795394_2_949715','20240630_795395_814443_1282_949715','20240630_814444_819086_358_949715','20240630_819087_819089_1_949715','20240701_819090_850571_664_949715','20240701_850572_862447_548_949715','20240701_862448_864326_100_949715','20240702_864327_889164_554_949715','20240702_889165_949932_574_949715','20240703_902496_932497_329_949715','20240703_932498_949931_380_949715','20240704_949371_950356_62_949715','20240704_950357_950357_0'] │          69 │       0 │         0 │                    │ 1970-01-01 00:00:00 │                    │
└─────────────┴───────┴─────────────────────┴───────────────────────────────┴─────────────────────┴────────────────────────────┴──────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────┴─────────┴───────────┴────────────────────┴─────────────────────┴────────────────────┘
s
Thanks, there are still some parts left. Can you share the progrss using following?
Copy code
SELECT * from system.merges WHERE is_mutation = 1;
p
Copy code
chi-signozagent-clickhouse-cluster-1-1-0.chi-signozagent-clickhouse-cluster-1-1.signoz.svc.cluster.local :) SELECT * from system.merges WHERE is_mutation = 1;

SELECT *
FROM system.merges
WHERE is_mutation = 1

Query id: d81e66fb-d29f-4eb7-a6da-52395b17c6ee

Ok.

0 rows in set. Elapsed: 0.015 sec. 

chi-signozagent-clickhouse-cluster-1-1-0.chi-signozagent-clickhouse-cluster-1-1.signoz.svc.cluster.local :)
s
What does this return?
Copy code
SELECT *
FROM system.merges
p
Copy code
Query id: f3ac14a8-36ea-4641-8ca6-7d9fa5026dbb

┌─database───────┬─table───────────────┬──────elapsed─┬────────────progress─┬─num_parts─┬─source_part_names─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─result_part_name─────────────┬─source_part_paths──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─result_part_path─────────────────────────────────────────────────────────────────────────────────┬─partition_id─┬─partition──┬─is_mutation─┬─total_size_bytes_compressed─┬─total_size_bytes_uncompressed─┬─total_size_marks─┬─bytes_read_uncompressed─┬─rows_read─┬─bytes_written_uncompressed─┬─rows_written─┬─columns_written─┬─memory_usage─┬─thread_id─┬─merge_type─┬─merge_algorithm─┐
│ signoz_metrics │ time_series_v4      │ 61.446713146 │  0.8565841112762417 │         5 │ ['20240705_1286208_1299914_212','20240705_1299921_1299987_5','20240705_1299990_1299990_0','20240705_1299997_1299997_0','20240705_1300001_1300001_0']                                                          │ 20240705_1286208_1300001_213 │ ['/var/lib/clickhouse/store/964/96476d5e-a525-49fa-8af3-d0089b883b13/20240705_1286208_1299914_212/','/var/lib/clickhouse/store/964/96476d5e-a525-49fa-8af3-d0089b883b13/20240705_1299921_1299987_5/','/var/lib/clickhouse/store/964/96476d5e-a525-49fa-8af3-d0089b883b13/20240705_1299990_1299990_0/','/var/lib/clickhouse/store/964/96476d5e-a525-49fa-8af3-d0089b883b13/20240705_1299997_1299997_0/','/var/lib/clickhouse/store/964/96476d5e-a525-49fa-8af3-d0089b883b13/20240705_1300001_1300001_0/'] │ /var/lib/clickhouse/store/964/96476d5e-a525-49fa-8af3-d0089b883b13/20240705_1286208_1300001_213/ │ 20240705     │ 2024-07-05 │           0 │                     1837439 │                     128823662 │               36 │               112664622 │    192507 │                  162622480 │       163840 │               0 │     76357947 │       587 │ Regular    │ Horizontal      │
│ signoz_metrics │ time_series_v4      │ 16.610112349 │ 0.12360333357802841 │         2 │ ['20240622_747450_1299518_1357','20240622_1299568_1299841_3']                                                                                                                                                 │ 20240622_747450_1299841_1358 │ ['/var/lib/clickhouse/store/964/96476d5e-a525-49fa-8af3-d0089b883b13/20240622_747450_1299518_1357/','/var/lib/clickhouse/store/964/96476d5e-a525-49fa-8af3-d0089b883b13/20240622_1299568_1299841_3/']                                                      │ /var/lib/clickhouse/store/964/96476d5e-a525-49fa-8af3-d0089b883b13/20240622_747450_1299841_1358/ │ 20240622     │ 2024-06-22 │           0 │                     2577481 │                     421508120 │               91 │                52407241 │     89240 │                   49520976 │        49152 │               0 │     87664711 │       589 │ Regular    │ Horizontal      │
│ signoz_metrics │ time_series_v4_6hrs │  6.981257262 │                   1 │         6 │ ['20240629_1298336_1299016_8','20240629_1299038_1299038_0','20240629_1299043_1299043_0','20240629_1299056_1299056_0','20240629_1299076_1299076_0','20240629_1299077_1299077_0']                               │ 20240629_1298336_1299077_9   │ ['/var/lib/clickhouse/store/20c/20cd3af7-056b-4b0e-9e2d-0eaec6f549c4/20240629_1298336_1299016_8/','/var/lib/clickhouse/store/20c/20cd3af7-056b-4b0e-9e2d-0eaec6f549c4/20240629_1299038_1299038_0/','/var/lib/clickhouse/store/20c/20cd3af7-056b-4b0e-9e2d-0eaec6f549c4/20240629_1299043_1299043_0/','/var/lib/clickhouse/store/20c/20cd3af7-056b-4b0e-9e2d-0eaec6f549c4/20240629_1299056_1299056_0/','/var/lib/clickhouse/store/20c/20cd3af7-056b-4b0e-9e2d-0eaec6f549c4/20240629_1299076_1299076_0/','/var/lib/clickhouse/store/20c/20cd3af7-056b-4b0e-9e2d-0eaec6f549c4/20240629_1299077_1299077_0/'] │ /var/lib/clickhouse/store/20c/20cd3af7-056b-4b0e-9e2d-0eaec6f549c4/20240629_1298336_1299077_9/   │ 20240629     │ 2024-06-29 │           0 │                     1106423 │                      26776392 │               16 │                26884291 │     48055 │                   18368640 │        16384 │               0 │     74896698 │       640 │ Regular    │ Horizontal      │
│ signoz_metrics │ time_series_v4_1day │  0.414117643 │ 0.28677634998730706 │         7 │ ['20240627_1295980_1299235_36','20240627_1299242_1299242_0','20240627_1299274_1299274_0','20240627_1299283_1299283_0','20240627_1299296_1299296_0','20240627_1299313_1299313_0','20240627_1299329_1299329_0'] │ 20240627_1295980_1299329_37  │ ['/var/lib/clickhouse/store/ebc/ebc5bb89-bab1-4573-baf0-b315925214d0/20240627_1295980_1299235_36/','/var/lib/clickhouse/store/ebc/ebc5bb89-bab1-4573-baf0-b315925214d0/20240627_1299242_1299242_0/','/var/lib/clickhouse/store/ebc/ebc5bb89-bab1-4573-baf0-b315925214d0/20240627_1299274_1299274_0/','/var/lib/clickhouse/store/ebc/ebc5bb89-bab1-4573-baf0-b315925214d0/20240627_1299283_1299283_0/','/var/lib/clickhouse/store/ebc/ebc5bb89-bab1-4573-baf0-b315925214d0/20240627_1299296_1299296_0/','/var/lib/clickhouse/store/ebc/ebc5bb89-bab1-4573-baf0-b315925214d0/20240627_1299313_1299313_0/','/var/lib/clickhouse/store/ebc/ebc5bb89-bab1-4573-baf0-b315925214d0/20240627_1299329_1299329_0/'] │ /var/lib/clickhouse/store/ebc/ebc5bb89-bab1-4573-baf0-b315925214d0/20240627_1295980_1299329_37/  │ 20240627     │ 2024-06-27 │           0 │                     2221957 │                      46656004 │               23 │                14302753 │     23723 │                          0 │            0 │               0 │     61856600 │       640 │ Regular    │ Horizontal      │
└────────────────┴─────────────────────┴──────────────┴─────────────────────┴───────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────┴────────────┴─────────────┴─────────────────────────────┴───────────────────────────────┴──────────────────┴─────────────────────────┴───────────┴────────────────────────────┴──────────────┴─────────────────┴──────────────┴───────────┴────────────┴─────────────────┘

4 rows in set. Elapsed: 0.023 sec. 

chi-signozagent-clickhouse-cluster-1-1-0.chi-signozagent-clickhouse-cluster-1-1.signoz.svc.cluster.local :)
FYI signozagent-schema-migrator has been restarting
Copy code
{"level":"info","timestamp":"2024-07-05T08:01:57.748Z","caller":"signozschemamigrator/migrate.go:89","msg":"Setting env var SIGNOZ_CLUSTER","component":"migrate cli","cluster-name":"cluster"}
{"level":"info","timestamp":"2024-07-05T08:01:57.748Z","caller":"signozschemamigrator/migrate.go:106","msg":"Successfully set env var SIGNOZ_CLUSTER ","component":"migrate cli","cluster-name":"cluster"}
{"level":"info","timestamp":"2024-07-05T08:01:57.748Z","caller":"signozschemamigrator/migrate.go:111","msg":"Setting env var SIGNOZ_REPLICATED","component":"migrate cli","replication":false}
{"level":"info","timestamp":"2024-07-05T08:01:57.816Z","caller":"migrationmanager/manager.go:76","msg":"Running migrations for all migrators","component":"migrationmanager"}
{"level":"info","timestamp":"2024-07-05T08:01:57.816Z","caller":"migrationmanager/manager.go:78","msg":"Running migrations for logs","component":"migrationmanager","migrator":"logs"}
{"level":"error","timestamp":"2024-07-05T08:04:58.305Z","caller":"migrationmanager/manager.go:81","msg":"Failed to run migrations for migrator","component":"migrationmanager","migrator":"logs","error":"failed to create database, err: code: 159, message: Watching task /clickhouse/signozagent-clickhouse/task_queue/ddl/query-0000000543 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 1 unfinished hosts (0 of them are currently executing the task), they are going to execute the query in background","stacktrace":"<http://github.com/SigNoz/signoz-otel-collector/migrationmanager.(*MigrationManager).Migrate|github.com/SigNoz/signoz-otel-collector/migrationmanager.(*MigrationManager).Migrate>\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/migrationmanager/manager.go:81\nmain.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozschemamigrator/migrate.go:126\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.11/x64/src/runtime/proc.go:267"}
{"level":"fatal","timestamp":"2024-07-05T08:04:58.305Z","caller":"signozschemamigrator/migrate.go:128","msg":"Failed to run migrations","component":"migrate cli","error":"failed to create database, err: code: 159, message: Watching task /clickhouse/signozagent-clickhouse/task_queue/ddl/query-0000000543 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 1 unfinished hosts (0 of them are currently executing the task), they are going to execute the query in background","stacktrace":"main.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozschemamigrator/migrate.go:128\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.11/x64/src/runtime/proc.go:267"}
s
For some reason the mutation is not progressing.
p
what could be done ?
now the schema-migrator job is not there also after retries
s
We could try killing and retry.
Copy code
KILL MUTATION ON CLUSTER 'cluster' WHERE mutation_id = 'mutation_950358.txt'
p
on any clickhouse replica right ?
s
Yes, this should propagate on all shards.
p
does it take sometime to run ?
s
Yes, but not too long.
p
still not done .. stuck at 74%
Copy code
KILL MUTATION ON CLUSTER cluster WHERE mutation_id = 'mutation_950358.txt' ASYNC

Query id: afefcfb1-3569-4e25-868a-43b1c2923035

┌─host───────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ chi-signozagent-clickhouse-cluster-0-0 │ 9000 │      0 │       │                   3 │                1 │
│ chi-signozagent-clickhouse-cluster-0-1 │ 9000 │      0 │       │                   2 │                1 │
└────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
┌─host───────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ chi-signozagent-clickhouse-cluster-1-0 │ 9000 │      0 │       │                   1 │                0 │
└────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
↓ Progress: 3.00 rows, 246.00 B (0.10 rows/s., 8.44 B/s.)  74%
↖ Progress: 3.00 rows, 246.00 B (0.02 rows/s., 2.05 B/s.)  74%
s
This also shows your
chi-signozagent-clickhouse-cluster-1-1
has some problems.
p
Copy code
Received exception from server (version 24.1.2):
Code: 159. DB::Exception: Received from localhost:9000. DB::Exception: Watching task /clickhouse/signozagent-clickhouse/task_queue/ddl/query-0000000545 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 1 unfinished hosts (0 of them are currently executing the task), they are going to execute the query in background. (TIMEOUT_EXCEEDED)
is it better to have shards or replicas ?
above log for the query
s
is it better to have shards or replicas ?
Both are fine and serve different purposes. Can you try restating the
chi-signozagent-clickhouse-cluster-1-1
?
p
doing .. and run again ?
s
Yes
p
Copy code
KILL MUTATION ON CLUSTER cluster WHERE mutation_id = 'mutation_950358.txt' ASYNC

Query id: b5b1c9ec-956a-4d9e-9f63-1149ea8905b4

┌─host───────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ chi-signozagent-clickhouse-cluster-1-1 │ 9000 │      0 │       │                   3 │                1 │
└────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
┌─host───────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ chi-signozagent-clickhouse-cluster-0-0 │ 9000 │      0 │       │                   2 │                1 │
└────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
┌─host───────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ chi-signozagent-clickhouse-cluster-0-1 │ 9000 │      0 │       │                   1 │                0 │
│ chi-signozagent-clickhouse-cluster-1-0 │ 9000 │      0 │       │                   0 │                0 │
└────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

4 rows in set. Elapsed: 0.310 sec. 

chi-signozagent-clickhouse-cluster-1-1-0.chi-signozagent-clickhouse-cluster-1-1.signoz.svc.cluster.local :)
s
Now can you try running collector and migrator again?
p
ohk
still the migrations are running and then failing and restarting
s
Do you see dirty migrations error in the logs?
If so, run this again
DROP TABLE signoz_logs.schema_migrations ON CLUSTER cluster
p
will check it restarted and lots logs
Copy code
{
  "level": "fatal",
  "timestamp": "2024-07-05T08:35:35.355Z",
  "caller": "signozschemamigrator/migrate.go:128",
  "msg": "Failed to run migrations",
  "component": "migrate cli",
  "error": "failed to create database, err: code: 159, message: Watching task /clickhouse/signozagent-clickhouse/task_queue/ddl/query-0000000586 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 1 unfinished hosts (0 of them are currently executing the task), they are going to execute the query in background",
  "stacktrace": "main.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozschemamigrator/migrate.go:128\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.11/x64/src/runtime/proc.go:267"
}
Copy code
{
  "level": "error",
  "timestamp": "2024-07-05T08:35:35.355Z",
  "caller": "migrationmanager/manager.go:81",
  "msg": "Failed to run migrations for migrator",
  "component": "migrationmanager",
  "migrator": "logs",
  "error": "failed to create database, err: code: 159, message: Watching task /clickhouse/signozagent-clickhouse/task_queue/ddl/query-0000000586 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 1 unfinished hosts (0 of them are currently executing the task), they are going to execute the query in background",
  "stacktrace": "<http://github.com/SigNoz/signoz-otel-collector/migrationmanager.(*MigrationManager).Migrate|github.com/SigNoz/signoz-otel-collector/migrationmanager.(*MigrationManager).Migrate>\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/migrationmanager/manager.go:81\nmain.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozschemamigrator/migrate.go:126\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.11/x64/src/runtime/proc.go:267"
}