```I0705 06 56 45 088037 1 poller go 245 pollStatefulSet pla SigNoz Community #support

```I0705 06:56:45.088037 1 poller.go:245] po...

Dhruv garg

07/05/2024, 8:19 AM

Copy code

I0705 06:56:45.088037       1 poller.go:245] pollStatefulSet():platform/chi-apm-clickhouse-cluster-0-0:%s/%s - TIMEOUT reached
E0705 06:56:45.088053       1 creator.go:118] updateStatefulSet():StatefulSet update wait failed. err: waitStatefulSet(platform/chi-apm-clickhouse-cluster-0-0) - wait timeout
I0705 06:56:45.088067       1 creator.go:237] onStatefulSetUpdateFailed():going to ROLLBACK FAILED StatefulSet platform/chi-apm-clickhouse-cluster-0-0
W0705 06:56:45.126433       1 warnings.go:70] spec.template.spec.containers[0].ports[3]: duplicate port definition with spec.template.spec.containers[0].ports[1]
I0705 06:56:45.126799       1 deleter.go:97] Delete Pod platform/chi-apm-clickhouse-cluster-0-0-0
E0705 06:56:45.144447       1 deleter.go:105] statefulSetDeletePod():FAIL delete Pod platform/chi-apm-clickhouse-cluster-0-0-0 err:pods "chi-apm-clickhouse-cluster-0-0-0" is forbidden: User "system:serviceaccount:platform:apm-clickhouse-operator" cannot delete resource "pods" in API group "" in the namespace "platform"
I0705 06:56:45.144471       1 worker.go:1458] Got abort. Abort
E0705 06:56:45.144586       1 worker-reconciler.go:735] reconcileStatefulSet():FAILED to reconcile StatefulSet: chi-apm-clickhouse-cluster-0-0 CHI: apm-clickhouse 
I0705 06:56:45.238676       1 creator.go:58] CreateServiceCHI():platform/apm-clickhouse/45fae902-f819-4d37-9016-a9013c4bcf8f:platform/apm-clickhouse
I0705 06:56:45.251515       1 worker.go:1201] updateService():platform/apm-clickhouse/45fae902-f819-4d37-9016-a9013c4bcf8f:Update Service platform/apm-clickhouse
E0705 06:56:45.323817       1 worker-reconciler.go:91] reconcileCHI():platform/apm-clickhouse/45fae902-f819-4d37-9016-a9013c4bcf8f:FAILED to update err: crud error - should abort
I0705 06:56:45.418894       1 worker.go:657] markReconcileComplete():platform/apm-clickhouse/45fae902-f819-4d37-9016-a9013c4bcf8f:reconcile completed unsuccessfully, task id: 45fae902-f819-4d37-9016-a9013c4bcf8f

getting this error in our production signoz suddenly, is there any quick way to resolve this? @nitya-signoz @Prashant Shahi

Dhruv garg

07/05/2024, 8:20 AM

Copy code

apm-clickhouse-operator-676658c454-vjxxr            2/2     Running     0             22d
apm-k8s-infra-otel-agent-24gqj                      1/1     Running     0             7d21h
apm-k8s-infra-otel-agent-5ggrm                      1/1     Running     0             7d21h
apm-k8s-infra-otel-agent-f5glt                      1/1     Running     0             7d21h
apm-k8s-infra-otel-agent-hxp87                      1/1     Running     0             7d21h
apm-k8s-infra-otel-agent-n79k7                      1/1     Running     0             7d21h
apm-k8s-infra-otel-agent-nfv6j                      1/1     Running     0             7d21h
apm-k8s-infra-otel-deployment-dfb9b77bf-lp5dj       1/1     Running     0             95m
apm-signoz-alertmanager-0                           1/1     Running     0             22d
apm-signoz-frontend-7b4dd6989c-cg2d9                1/1     Running     0             95m
apm-signoz-otel-collector-789cf6c675-fmxkc          1/1     Running     0             7d21h
apm-signoz-otel-collector-789cf6c675-zkndn          1/1     Running     3 (90m ago)   95m
apm-signoz-otel-collector-metrics-f69ff5867-wxgpt   1/1     Running     0             95m
apm-signoz-query-service-0                          0/1     Running     0             7d21h
apm-signoz-schema-migrator-upgrade-dfjwj            0/1     Completed   0             89m
apm-zookeeper-0                                     1/1     Running     0             22d

there is no pod for clickhouse also here right now

Nitish

07/07/2024, 1:07 PM

The error logs indicate that there are permission problems. Can you try the below steps: • First, check if the ClickHouse custom resource is present and correctly defined:

Copy code

kubectl get clickhouseinstallations -n platform

• Look at the logs of the ClickHouse operator pod for more detailed error messages:

Copy code

kubectl logs -n platform apm-clickhouse-operator-676658c454-vjxxr

• It appears that the ClickHouse operator doesn't have the necessary permissions. Review and possibly update the RBAC rules for the ClickHouse operator:

Copy code

kubectl get clusterrole clickhouse-operator-cluster-role -o yaml
kubectl get role -n platform clickhouse-operator-role -o yaml

• If the ClickHouse pod is missing, you might need to recreate it. However, be cautious as this might lead to data loss if not done correctly. First, check if the StatefulSet exists:

Copy code

kubectl get statefulset -n platform

If it doesn't exist, you might need to reapply your SigNoz Helm chart • Ensure that the Persistent Volumes for ClickHouse are still intact:

Copy code

kubectl get pv -n platform

• Try restarting the ClickHouse operator:

Copy code

kubectl rollout restart deployment apm-clickhouse-operator -n platform

• Ensure all other SigNoz components are running correctly. The query-service pod seems to be in a running state but not ready (0/1). Check its logs:

Copy code

kubectl logs -n platform apm-signoz-query-service-0

👍 1

7 Views

Open in Slack

Previous Next