```❯ k get -n platform pods ...
# support
d
Copy code
❯ k get -n platform pods                       
NAME                                                 READY   STATUS     RESTARTS   AGE
apm-clickhouse-operator-676658c454-292fl             2/2     Running    0          4m8s
apm-k8s-infra-otel-agent-5pjdr                       1/1     Running    0          4m8s
apm-k8s-infra-otel-agent-78qzq                       1/1     Running    0          4m9s
apm-k8s-infra-otel-agent-cbmhk                       1/1     Running    0          4m9s
apm-k8s-infra-otel-agent-f9bns                       1/1     Running    0          4m9s
apm-k8s-infra-otel-agent-knp42                       1/1     Running    0          4m9s
apm-k8s-infra-otel-agent-m6xrr                       1/1     Running    0          4m9s
apm-k8s-infra-otel-agent-nd95t                       1/1     Running    0          4m9s
apm-k8s-infra-otel-deployment-dfb9b77bf-xvtmh        1/1     Running    0          4m8s
apm-signoz-alertmanager-0                            0/1     Init:0/1   0          4m7s
apm-signoz-frontend-7b4dd6989c-hb88f                 0/1     Init:0/1   0          4m8s
apm-signoz-otel-collector-7d6cc8f4bc-nxkk6           0/1     Init:0/1   0          4m7s
apm-signoz-otel-collector-7d6cc8f4bc-wxtb2           0/1     Init:0/1   0          4m7s
apm-signoz-otel-collector-metrics-58c687fc49-lt7cw   0/1     Init:0/1   0          4m8s
apm-signoz-query-service-0                           0/1     Init:0/1   0          4m7s
apm-signoz-schema-migrator-init-sp222                0/1     Init:0/2   0          4m7s
apm-zookeeper-0                                      1/1     Running    0          4m7s
for some reason, the clickhouse pod is not coming up at all when recreating signoz using helm
n
check the log of the clickhouse pod when it tries to start ? also check the logs of the clickhouse operator cc @Prashant Shahi
i
execute this command to check if you have clickHouse CRDs :
Copy code
kubectl get customreourcedefinition -A |grep clickhouse
You should have :
Copy code
clickhouseinstallations.clickhouse.altinity.com            2024-07-01T07:09:16Z
clickhouseinstallationtemplates.clickhouse.altinity.com    2024-07-01T07:09:16Z
clickhouseoperatorconfigurations.clickhouse.altinity.com   2024-07-01T07:09:16Z
Otherwise install ClickHouse CRDs with this link : https://artifacthub.io/packages/helm/altinity-clickhouse-operator/altinity-clickhouse-operator
👍 1
d
the clickhouse pod was not coming up, so not able to check logs
Copy code
❯ k get crd | grep click                                 
<http://clickhouseinstallations.clickhouse.altinity.com|clickhouseinstallations.clickhouse.altinity.com>            2024-05-13T12:51:18Z
<http://clickhouseinstallationtemplates.clickhouse.altinity.com|clickhouseinstallationtemplates.clickhouse.altinity.com>    2024-05-13T12:51:18Z
<http://clickhouseoperatorconfigurations.clickhouse.altinity.com|clickhouseoperatorconfigurations.clickhouse.altinity.com>   2024-05-13T12:51:19Z
I am already having clickhouse CRDs @nitya-signoz @Prashant Shahi
Copy code
I0629 12:56:59.381884       1 clickhouse_operator.go:146] Run():Starting CHI controller
I0629 12:56:59.381937       1 controller.go:464] Starting ClickHouseInstallation controller
I0629 12:56:59.381951       1 controller.go:949] waitForCacheSync():Syncing caches for ClickHouseInstallation controller
I0629 12:56:59.415013       1 controller.go:565] ENQUEUE new ReconcileCHI cmd=add for platform/apm-clickhouse
I0629 12:56:59.482109       1 controller.go:954] waitForCacheSync():Caches are synced for ClickHouseInstallation controller
I0629 12:56:59.482243       1 labeler.go:81] OPERATOR_POD_NAMESPACE=platform OPERATOR_POD_NAME=apm-clickhouse-operator-676658c454-292fl
I0629 12:56:59.665814       1 controller.go:496] Run():ClickHouseInstallation controller: starting workers number: 11
I0629 12:56:59.665838       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 1 out of 11
I0629 12:56:59.665969       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 2 out of 11
I0629 12:56:59.666033       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 3 out of 11
I0629 12:56:59.666055       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 4 out of 11
I0629 12:56:59.666068       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 5 out of 11
I0629 12:56:59.666081       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 6 out of 11
I0629 12:56:59.666111       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 7 out of 11
I0629 12:56:59.666159       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 8 out of 11
I0629 12:56:59.666209       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 9 out of 11
I0629 12:56:59.666280       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 10 out of 11
I0629 12:56:59.666356       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 11 out of 11
I0629 12:56:59.666424       1 controller.go:508] Run():ClickHouseInstallation controller: workers started
I0629 12:57:09.666599       1 worker.go:379] worker.go:379:updateCHI():start:platform/apm-clickhouse
E0629 12:57:09.670308       1 worker-deleter.go:581] deleteCHI():platform/apm-clickhouse:unable to get CRD, got error: <http://customresourcedefinitions.apiextensions.k8s.io|customresourcedefinitions.apiextensions.k8s.io> "<http://clickhouseinstallations.clickhouse.altinity.com|clickhouseinstallations.clickhouse.altinity.com>" is forbidden: User "system:serviceaccount:platform:apm-clickhouse-operator" cannot get resource "customresourcedefinitions" in API group "<http://apiextensions.k8s.io|apiextensions.k8s.io>" at the cluster scope 
I0629 12:57:09.670356       1 worker-deleter.go:582] deleteCHI():platform/apm-clickhouse:will delete chi platform/apm-clickhouse
I0629 12:57:09.684284       1 worker-deleter.go:285] deleteCHIProtocol():platform/apm-clickhouse/725f9d06-651f-4041-9b65-394cb32998f3:Delete CHI started
I0629 12:57:09.766432       1 deleter.go:305] deleteServiceCHI():platform/apm-clickhouse/725f9d06-651f-4041-9b65-394cb32998f3:platform/apm-clickhouse
I0629 12:57:09.767198       1 controller.go:624] OK delete watch (platform/apm-clickhouse)
I0629 12:57:09.775952       1 cluster.go:84] Run query on: chi-apm-clickhouse-cluster-0-0.platform.svc.cluster.local of [chi-apm-clickhouse-cluster-0-0.platform.svc.cluster.local]
E0629 12:57:09.786612       1 connection.go:98] connect():FAILED Ping(<http://clickhouse_operator>:***@chi-apm-clickhouse-cluster-0-0.platform.svc.cluster.local:8123/). Err: dial tcp: lookup chi-apm-clickhouse-cluster-0-0.platform.svc.cluster.local on 172.20.0.10:53: no such host
E0629 12:57:09.786716       1 connection.go:126] QueryContext():FAILED connect(<http://clickhouse_operator>:***@chi-apm-clickhouse-cluster-0-0.platform.svc.cluster.local:8123/) for SQL: SELECT
I am getting this error message
i
You have a problem with your CRDs : E0629 125709.670308 1 worker-deleter.go:581] deleteCHI()platform/apm clickhouseunable to get CRD, got error: customresourcedefinitions.apiextensions.k8s.io "clickhouseinstallations.clickhouse.altinity.com" is forbidden: User "systemserviceaccountplatform:apm-clickhouse-operator" cannot get resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope I'm not very familiar with RBAC (Role-Based Access Control). So, you might want to consider adding a ClusterRole to grant the ClickHouse operator access to CRDs. Before applying this configuration, it's crucial to understand the implications. A ClusterRole can introduce significant security risks to your cluster if not used cautiously. Here's an example ClusterRole that should work for granting view access to CRDs.I don't have this problem in my cluster, so I can't guarantee that this will work perfectly.
Copy code
kind: ClusterRole
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
metadata:
  name: clickhouse-operator-crd-viewer
rules:
- apiGroups: ["<http://apiextensions.k8s.io|apiextensions.k8s.io>"]
  resources: ["customresourcedefinitions"]
  verbs: ["get", "watch", "list"]
d
okay, let me go through these
so I checked my production cluster, which have working signoz. All the roles, clusterroles, rolebinding and clusterrolebinding are same there
i
I don't know if you've tried adding my ClusterRole. If it's on your dev cluster, it might be worth a shot. I think the problem might be on your end. If it works in the production cluster, it should also work in the development cluster. At your place I would try to erase the delta between the prod cluster and the dev one. By restarting from scratch. 1. uninstall the template :
helm uninstall <template name> -n <namespace>
2. Delete CRDs :
Copy code
kubectl delete -f <https://github.com/Altinity/clickhouse-operator/raw/master/deploy/helm/clickhouse-operator/crds/CustomResourceDefinition-clickhouseinstallations.clickhouse.altinity.com.yaml>
 kubectl delete -f <https://github.com/Altinity/clickhouse-operator/raw/master/deploy/helm/clickhouse-operator/crds/CustomResourceDefinition-clickhouseinstallationtemplates.clickhouse.altinity.com.yaml>
 kubectl delete -f <https://github.com/Altinity/clickhouse-operator/raw/master/deploy/helm/clickhouse-operator/crds/CustomResourceDefinition-clickhouseoperatorconfigurations.clickhouse.altinity.com.yaml>
3. Apply CRDs :
Copy code
kubectl apply -f <https://github.com/Altinity/clickhouse-operator/raw/master/deploy/helm/clickhouse-operator/crds/CustomResourceDefinition-clickhouseinstallations.clickhouse.altinity.com.yaml>
  kubectl apply -f <https://github.com/Altinity/clickhouse-operator/raw/master/deploy/helm/clickhouse-operator/crds/CustomResourceDefinition-clickhouseinstallationtemplates.clickhouse.altinity.com.yaml>
  kubectl apply -f <https://github.com/Altinity/clickhouse-operator/raw/master/deploy/helm/clickhouse-operator/crds/CustomResourceDefinition-clickhouseoperatorconfigurations.clickhouse.altinity.com.yaml>
4. And Install again SigNoz helm template :
helm --namespace <my-namespace> install <my-release> signoz/signoz
d
okay let me try that
Thanks bro, I was able to get it working
1
can you also tell, how can I check if s3 cold storage is working, and configured properly?
I added these and everything is working fine with no error logs in clickhouse pods, but can’t see any data in s3 till now
Copy code
clickhouse:
  persistence:
    size: 100Gi
  coldStorage:
    enabled: true
    # Set free space size on default disk in bytes
    defaultKeepFreeSpaceBytes: "10485760" # 10MiB
    type: s3
    endpoint: https://<bucket-name>.<http://s3.amazonaws.com/data/|s3.amazonaws.com/data/>
    accessKey: <access_key_id>
    secretAccess: <secret_access_key>
i
Execute this command to show all of your persistent volumes:
kubectl get pv -A
You should try to find a persistent volume with 100 Gi and correlate it with one of your persistent volumes in S3.
d
I can only see 3 pv for EBS
Copy code
pvc-24791e9f-1365-42ea-a3f8-daec2086dff9   1Gi        RWO            Delete           Bound    platform/storage-apm-signoz-alertmanager-0                            gp3-resizable   <unset>                          49d
pvc-436c87ac-9fb1-41b9-936d-f97d0f683a49   100Gi      RWO            Delete           Bound    platform/data-volumeclaim-template-chi-apm-clickhouse-cluster-0-0-0   gp3-resizable   <unset>                          49d
pvc-9df65e8f-ac2e-443b-8a99-9aa2a2a86928   1Gi        RWO            Delete           Bound    platform/signoz-db-apm-signoz-query-service-0                         gp3-resizable   <unset>                          49d
pvc-d9ac9846-35aa-4329-befd-78a2008f18f7   8Gi        RWO            Delete           Bound    platform/data-apm-zookeeper-0                                         gp3-resizable   <unset>                          49d
can’t see anything for s3
i
it should be this one : pvc-436c87ac-9fb1-41b9-936d-f97d0f683a49 Check if in s3 you have a Cold Storage that have this ID. it aged at 49days so that a little weird ... Maybe try to remove this two lignes :
Copy code
clickhouse:
  #persistence:
    #size: 100Gi
  coldStorage:
    enabled: true
    # Set free space size on default disk in bytes
    defaultKeepFreeSpaceBytes: "10485760" # 10MiB
    type: s3
    endpoint: https://<bucket-name>.s3.amazonaws.com/data/
    accessKey: <access_key_id>
    secretAccess: <secret_access_key>
d
no, cold storage works, if no space is left on original EBS, so there will be both EBS and S3 at any given point of time, that’s what I want to do I was following this document https://signoz.io/docs/userguide/retention-period/
👍 1
i
Perfect!
d
I meant, I am still not sure, if it is working or not. But, dont think PVC has anything to do with it