``` k get n platform pods NAME READY STATUS RESTARTS AGE apm SigNoz Community #support

```❯ k get -n platform pods ...

Dhruv garg

06/29/2024, 1:01 PM

Copy code

❯ k get -n platform pods                       
NAME                                                 READY   STATUS     RESTARTS   AGE
apm-clickhouse-operator-676658c454-292fl             2/2     Running    0          4m8s
apm-k8s-infra-otel-agent-5pjdr                       1/1     Running    0          4m8s
apm-k8s-infra-otel-agent-78qzq                       1/1     Running    0          4m9s
apm-k8s-infra-otel-agent-cbmhk                       1/1     Running    0          4m9s
apm-k8s-infra-otel-agent-f9bns                       1/1     Running    0          4m9s
apm-k8s-infra-otel-agent-knp42                       1/1     Running    0          4m9s
apm-k8s-infra-otel-agent-m6xrr                       1/1     Running    0          4m9s
apm-k8s-infra-otel-agent-nd95t                       1/1     Running    0          4m9s
apm-k8s-infra-otel-deployment-dfb9b77bf-xvtmh        1/1     Running    0          4m8s
apm-signoz-alertmanager-0                            0/1     Init:0/1   0          4m7s
apm-signoz-frontend-7b4dd6989c-hb88f                 0/1     Init:0/1   0          4m8s
apm-signoz-otel-collector-7d6cc8f4bc-nxkk6           0/1     Init:0/1   0          4m7s
apm-signoz-otel-collector-7d6cc8f4bc-wxtb2           0/1     Init:0/1   0          4m7s
apm-signoz-otel-collector-metrics-58c687fc49-lt7cw   0/1     Init:0/1   0          4m8s
apm-signoz-query-service-0                           0/1     Init:0/1   0          4m7s
apm-signoz-schema-migrator-init-sp222                0/1     Init:0/2   0          4m7s
apm-zookeeper-0                                      1/1     Running    0          4m7s

for some reason, the clickhouse pod is not coming up at all when recreating signoz using helm

nitya-signoz

06/30/2024, 7:21 PM

check the log of the clickhouse pod when it tries to start ? also check the logs of the clickhouse operator cc @Prashant Shahi

Ileo

07/01/2024, 8:12 AM

execute this command to check if you have clickHouse CRDs :

Copy code

kubectl get customreourcedefinition -A |grep clickhouse

You should have :

Copy code

clickhouseinstallations.clickhouse.altinity.com            2024-07-01T07:09:16Z
clickhouseinstallationtemplates.clickhouse.altinity.com    2024-07-01T07:09:16Z
clickhouseoperatorconfigurations.clickhouse.altinity.com   2024-07-01T07:09:16Z

Otherwise install ClickHouse CRDs with this link : https://artifacthub.io/packages/helm/altinity-clickhouse-operator/altinity-clickhouse-operator

👍 1

Dhruv garg

07/01/2024, 8:59 AM

the clickhouse pod was not coming up, so not able to check logs

Dhruv garg

07/01/2024, 9:00 AM

Copy code

❯ k get crd | grep click                                 
<http://clickhouseinstallations.clickhouse.altinity.com|clickhouseinstallations.clickhouse.altinity.com>            2024-05-13T12:51:18Z
<http://clickhouseinstallationtemplates.clickhouse.altinity.com|clickhouseinstallationtemplates.clickhouse.altinity.com>    2024-05-13T12:51:18Z
<http://clickhouseoperatorconfigurations.clickhouse.altinity.com|clickhouseoperatorconfigurations.clickhouse.altinity.com>   2024-05-13T12:51:19Z

I am already having clickhouse CRDs @nitya-signoz @Prashant Shahi

Dhruv garg

07/01/2024, 9:01 AM

Copy code

I0629 12:56:59.381884       1 clickhouse_operator.go:146] Run():Starting CHI controller
I0629 12:56:59.381937       1 controller.go:464] Starting ClickHouseInstallation controller
I0629 12:56:59.381951       1 controller.go:949] waitForCacheSync():Syncing caches for ClickHouseInstallation controller
I0629 12:56:59.415013       1 controller.go:565] ENQUEUE new ReconcileCHI cmd=add for platform/apm-clickhouse
I0629 12:56:59.482109       1 controller.go:954] waitForCacheSync():Caches are synced for ClickHouseInstallation controller
I0629 12:56:59.482243       1 labeler.go:81] OPERATOR_POD_NAMESPACE=platform OPERATOR_POD_NAME=apm-clickhouse-operator-676658c454-292fl
I0629 12:56:59.665814       1 controller.go:496] Run():ClickHouseInstallation controller: starting workers number: 11
I0629 12:56:59.665838       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 1 out of 11
I0629 12:56:59.665969       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 2 out of 11
I0629 12:56:59.666033       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 3 out of 11
I0629 12:56:59.666055       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 4 out of 11
I0629 12:56:59.666068       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 5 out of 11
I0629 12:56:59.666081       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 6 out of 11
I0629 12:56:59.666111       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 7 out of 11
I0629 12:56:59.666159       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 8 out of 11
I0629 12:56:59.666209       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 9 out of 11
I0629 12:56:59.666280       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 10 out of 11
I0629 12:56:59.666356       1 controller.go:498] Run():ClickHouseInstallation controller: starting worker 11 out of 11
I0629 12:56:59.666424       1 controller.go:508] Run():ClickHouseInstallation controller: workers started
I0629 12:57:09.666599       1 worker.go:379] worker.go:379:updateCHI():start:platform/apm-clickhouse
E0629 12:57:09.670308       1 worker-deleter.go:581] deleteCHI():platform/apm-clickhouse:unable to get CRD, got error: <http://customresourcedefinitions.apiextensions.k8s.io|customresourcedefinitions.apiextensions.k8s.io> "<http://clickhouseinstallations.clickhouse.altinity.com|clickhouseinstallations.clickhouse.altinity.com>" is forbidden: User "system:serviceaccount:platform:apm-clickhouse-operator" cannot get resource "customresourcedefinitions" in API group "<http://apiextensions.k8s.io|apiextensions.k8s.io>" at the cluster scope 
I0629 12:57:09.670356       1 worker-deleter.go:582] deleteCHI():platform/apm-clickhouse:will delete chi platform/apm-clickhouse
I0629 12:57:09.684284       1 worker-deleter.go:285] deleteCHIProtocol():platform/apm-clickhouse/725f9d06-651f-4041-9b65-394cb32998f3:Delete CHI started
I0629 12:57:09.766432       1 deleter.go:305] deleteServiceCHI():platform/apm-clickhouse/725f9d06-651f-4041-9b65-394cb32998f3:platform/apm-clickhouse
I0629 12:57:09.767198       1 controller.go:624] OK delete watch (platform/apm-clickhouse)
I0629 12:57:09.775952       1 cluster.go:84] Run query on: chi-apm-clickhouse-cluster-0-0.platform.svc.cluster.local of [chi-apm-clickhouse-cluster-0-0.platform.svc.cluster.local]
E0629 12:57:09.786612       1 connection.go:98] connect():FAILED Ping(<http://clickhouse_operator>:***@chi-apm-clickhouse-cluster-0-0.platform.svc.cluster.local:8123/). Err: dial tcp: lookup chi-apm-clickhouse-cluster-0-0.platform.svc.cluster.local on 172.20.0.10:53: no such host
E0629 12:57:09.786716       1 connection.go:126] QueryContext():FAILED connect(<http://clickhouse_operator>:***@chi-apm-clickhouse-cluster-0-0.platform.svc.cluster.local:8123/) for SQL: SELECT

I am getting this error message

Ileo

07/01/2024, 10:37 AM

You have a problem with your CRDs : E0629 125709.670308 1 worker-deleter.go:581] deleteCHI()platform/apm clickhouseunable to get CRD, got error: customresourcedefinitions.apiextensions.k8s.io "clickhouseinstallations.clickhouse.altinity.com" is forbidden: User "systemserviceaccountplatform:apm-clickhouse-operator" cannot get resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope I'm not very familiar with RBAC (Role-Based Access Control). So, you might want to consider adding a ClusterRole to grant the ClickHouse operator access to CRDs. Before applying this configuration, it's crucial to understand the implications. A ClusterRole can introduce significant security risks to your cluster if not used cautiously. Here's an example ClusterRole that should work for granting view access to CRDs.I don't have this problem in my cluster, so I can't guarantee that this will work perfectly.

Copy code

kind: ClusterRole
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
metadata:
  name: clickhouse-operator-crd-viewer
rules:
- apiGroups: ["<http://apiextensions.k8s.io|apiextensions.k8s.io>"]
  resources: ["customresourcedefinitions"]
  verbs: ["get", "watch", "list"]

Dhruv garg

07/01/2024, 10:40 AM

okay, let me go through these

Dhruv garg

07/01/2024, 10:59 AM

so I checked my production cluster, which have working signoz. All the roles, clusterroles, rolebinding and clusterrolebinding are same there

Ileo

07/01/2024, 12:22 PM

I don't know if you've tried adding my ClusterRole. If it's on your dev cluster, it might be worth a shot. I think the problem might be on your end. If it works in the production cluster, it should also work in the development cluster. At your place I would try to erase the delta between the prod cluster and the dev one. By restarting from scratch. 1. uninstall the template :

helm uninstall <template name> -n <namespace>

2. Delete CRDs :

Copy code

kubectl delete -f <https://github.com/Altinity/clickhouse-operator/raw/master/deploy/helm/clickhouse-operator/crds/CustomResourceDefinition-clickhouseinstallations.clickhouse.altinity.com.yaml>
 kubectl delete -f <https://github.com/Altinity/clickhouse-operator/raw/master/deploy/helm/clickhouse-operator/crds/CustomResourceDefinition-clickhouseinstallationtemplates.clickhouse.altinity.com.yaml>
 kubectl delete -f <https://github.com/Altinity/clickhouse-operator/raw/master/deploy/helm/clickhouse-operator/crds/CustomResourceDefinition-clickhouseoperatorconfigurations.clickhouse.altinity.com.yaml>

3. Apply CRDs :

Copy code

kubectl apply -f <https://github.com/Altinity/clickhouse-operator/raw/master/deploy/helm/clickhouse-operator/crds/CustomResourceDefinition-clickhouseinstallations.clickhouse.altinity.com.yaml>
  kubectl apply -f <https://github.com/Altinity/clickhouse-operator/raw/master/deploy/helm/clickhouse-operator/crds/CustomResourceDefinition-clickhouseinstallationtemplates.clickhouse.altinity.com.yaml>
  kubectl apply -f <https://github.com/Altinity/clickhouse-operator/raw/master/deploy/helm/clickhouse-operator/crds/CustomResourceDefinition-clickhouseoperatorconfigurations.clickhouse.altinity.com.yaml>

4. And Install again SigNoz helm template :

helm --namespace <my-namespace> install <my-release> signoz/signoz

Dhruv garg

07/01/2024, 12:22 PM

okay let me try that

Dhruv garg

07/01/2024, 5:29 PM

Thanks bro, I was able to get it working

✅ 1

Dhruv garg

07/01/2024, 5:29 PM

can you also tell, how can I check if s3 cold storage is working, and configured properly?

Dhruv garg

07/01/2024, 5:30 PM

I added these and everything is working fine with no error logs in clickhouse pods, but can’t see any data in s3 till now

Copy code

clickhouse:
  persistence:
    size: 100Gi
  coldStorage:
    enabled: true
    # Set free space size on default disk in bytes
    defaultKeepFreeSpaceBytes: "10485760" # 10MiB
    type: s3
    endpoint: https://<bucket-name>.<http://s3.amazonaws.com/data/|s3.amazonaws.com/data/>
    accessKey: <access_key_id>
    secretAccess: <secret_access_key>

Ileo

07/02/2024, 7:15 AM

Execute this command to show all of your persistent volumes:

kubectl get pv -A

You should try to find a persistent volume with 100 Gi and correlate it with one of your persistent volumes in S3.

Dhruv garg

07/02/2024, 8:31 AM

I can only see 3 pv for EBS

Copy code

pvc-24791e9f-1365-42ea-a3f8-daec2086dff9   1Gi        RWO            Delete           Bound    platform/storage-apm-signoz-alertmanager-0                            gp3-resizable   <unset>                          49d
pvc-436c87ac-9fb1-41b9-936d-f97d0f683a49   100Gi      RWO            Delete           Bound    platform/data-volumeclaim-template-chi-apm-clickhouse-cluster-0-0-0   gp3-resizable   <unset>                          49d
pvc-9df65e8f-ac2e-443b-8a99-9aa2a2a86928   1Gi        RWO            Delete           Bound    platform/signoz-db-apm-signoz-query-service-0                         gp3-resizable   <unset>                          49d
pvc-d9ac9846-35aa-4329-befd-78a2008f18f7   8Gi        RWO            Delete           Bound    platform/data-apm-zookeeper-0                                         gp3-resizable   <unset>                          49d

Dhruv garg

07/02/2024, 8:31 AM

can’t see anything for s3

Ileo

07/02/2024, 9:50 AM

it should be this one : pvc-436c87ac-9fb1-41b9-936d-f97d0f683a49 Check if in s3 you have a Cold Storage that have this ID. it aged at 49days so that a little weird ... Maybe try to remove this two lignes :

Copy code

clickhouse:
  #persistence:
    #size: 100Gi
  coldStorage:
    enabled: true
    # Set free space size on default disk in bytes
    defaultKeepFreeSpaceBytes: "10485760" # 10MiB
    type: s3
    endpoint: https://<bucket-name>.s3.amazonaws.com/data/
    accessKey: <access_key_id>
    secretAccess: <secret_access_key>

Dhruv garg

07/02/2024, 10:45 AM

no, cold storage works, if no space is left on original EBS, so there will be both EBS and S3 at any given point of time, that’s what I want to do I was following this document https://signoz.io/docs/userguide/retention-period/

👍 1

Ileo

07/02/2024, 10:54 AM

Perfect!

Dhruv garg

07/02/2024, 10:55 AM

I meant, I am still not sure, if it is working or not. But, dont think PVC has anything to do with it

3 Views

Open in Slack

Previous Next