Hi everyone, I've run into a problem with my signo...
# support
a
Hi everyone, I've run into a problem with my signoz deployment after upgrading to 0.22.0 . Everything is installed using helm.
Copy code
REVISION        UPDATED                         STATUS          CHART           APP VERSION     DESCRIPTION                            
11              Wed Jul  5 18:52:30 2023        superseded      signoz-0.17.0   0.21.0          Upgrade complete
12              Tue Jul 11 15:28:08 2023        deployed        signoz-0.18.1   0.22.0          Upgrade complete
Suddenly receiving
<Error> TCPHandler: Code: 170. DB::Exception: Requested cluster 'cluster' not found.
Looking at clickhouse:
Copy code
:) select cluster from system.clusters

SELECT cluster
FROM system.clusters

Query id: d822fc0f-67b6-4157-97a4-d7dde022c6cd

┌─cluster─────────────────────────────────────────┐
│ test_cluster_one_shard_three_replicas_localhost │
│ test_cluster_one_shard_three_replicas_localhost │
│ test_cluster_one_shard_three_replicas_localhost │
│ test_cluster_two_shards                         │
│ test_cluster_two_shards                         │
│ test_cluster_two_shards_internal_replication    │
│ test_cluster_two_shards_internal_replication    │
│ test_cluster_two_shards_localhost               │
│ test_cluster_two_shards_localhost               │
│ test_shard_localhost                            │
│ test_shard_localhost_secure                     │
│ test_unavailable_shard                          │
│ test_unavailable_shard                          │
└─────────────────────────────────────────────────┘
The PVC is mounted:
Copy code
Filesystem                Size      Used Available Use% Mounted on
/dev/sdd                503.8G     96.2G    407.6G  19% /var/lib/clickhouse
Attaching file with log snippets. Wouldn't mind some help, unsure how to recover this. Thanks!!
p
are you using external clickhouse with SigNoz?
There should be a cluster named:
cluster
which seems to be missing in your case.
a
Hi @Prashant Shahi I am not using external clickhouse. I am using the clickhouse instance deployed with signoz. It was working with version 0.21
Copy code
┌─cluster─────────────────────────────────────────┐
│ all-replicated                                  │
│ all-sharded                                     │
│ cluster                                         │
│ test_cluster_one_shard_three_replicas_localhost │
│ test_cluster_one_shard_three_replicas_localhost │
│ test_cluster_one_shard_three_replicas_localhost │
│ test_cluster_two_shards                         │
│ test_cluster_two_shards                         │
│ test_cluster_two_shards_internal_replication    │
│ test_cluster_two_shards_internal_replication    │
│ test_cluster_two_shards_localhost               │
│ test_cluster_two_shards_localhost               │
│ test_shard_localhost                            │
│ test_shard_localhost_secure                     │
│ test_unavailable_shard                          │
│ test_unavailable_shard                          │
└─────────────────────────────────────────────────┘
but after upgrading to v0.22 only the
test_*
clusters are available.
helm rollback signoz
to the previous revision (v0.21.0), restored clickhouse to working condition. Not sure what happened during the upgrade. Perhaps I can attempt upgrading to v0.22.0 again and see if it was transient issue.
p
I tested in latest chart with v0.22
Copy code
SELECT cluster
FROM system.clusters

Query id: 70c515db-ac46-4738-8518-3bf84757e645

┌─cluster─────────────────────────────────────────┐
│ all-replicated                                  │
│ all-sharded                                     │
│ cluster                                         │
│ test_cluster_one_shard_three_replicas_localhost │
│ test_cluster_one_shard_three_replicas_localhost │
│ test_cluster_one_shard_three_replicas_localhost │
│ test_cluster_two_shards                         │
│ test_cluster_two_shards                         │
│ test_cluster_two_shards_internal_replication    │
│ test_cluster_two_shards_internal_replication    │
│ test_cluster_two_shards_localhost               │
│ test_cluster_two_shards_localhost               │
│ test_shard_localhost                            │
│ test_shard_localhost_secure                     │
│ test_unavailable_shard                          │
│ test_unavailable_shard                          │
└─────────────────────────────────────────────────┘
not able to reproduce the issue
did you update the helm repository prior to upgrading?
Copy code
helm repo update
a
Yes. At the time of the upgrade, chart version 0.18.1 was the latest:
Copy code
signoz/signoz           0.18.1          0.22.0          SigNoz Observability Platform Helm Chart
Resulting in:
Copy code
1. You have just deployed SigNoz cluster:

- frontend version: '0.22.0'
- query-service version: '0.22.0'
- alertmanager version: '0.23.1'
- otel-collector version: '0.79.2'
- otel-collector-metrics version: '0.79.2'
I'll attempt upgrade again and report back.
p
@Al yes, it will be much appreciated.
a
@Prashant Shahi Just attempted upgrading to Chart 0.18.2 and the same issue occurred, clickhouse was unable to load 'cluster'.
Copy code
REVISION        UPDATED                         STATUS          CHART           APP VERSION     DESCRIPTION   

14              Tue Jul 18 19:10:43 2023        superseded      signoz-0.18.2   0.22.0          Upgrade complete
The following log entry seems relevant:
Copy code
2023.07.18 19:19:59.669624 [ 240 ] {} <Error> DDLWorker: Cannot parse DDL task query-0000025438: Cannot parse query or obtain cluster info. Will try to send error status: 371
Code: 371. DB::Exception: DDL task query-0000025438 contains current host chi-signoz-clickhouse-cluster-0-0:9000 in cluster cluster, but there is no such cluster here. (INCONSISTENT_CLUSTER_DEFINITION) (version 22.8.8.3 (official build))
2023.07.18 19:19:59.681107 [ 244 ] {} <Information> DDLWorker: Task query-0000025438 is outdated, deleting it
2023.07.18 19:19:59.684970 [ 240 ] {} <Error> DDLWorker: Cannot parse DDL task query-0000025439: Cannot parse query or obtain cluster info. Will try to send error status: 371
I have rolled back to
signoz-0.17.0   0.21.0
and the signoz deployment is functional again.
p
@Al Latest version is
signoz-0.19.1
. Also, you would be required to run the migration steps: https://signoz.io/docs/operate/migration/upgrade-0.23/
a
Hi @Prashant Shahi upgrade to
signoz-0.19.1
completed successfully!