This message was deleted SigNoz Community #general

Join Slack

This message was deleted.

# general

Slackbot

10/31/2023, 10:40 AM

This message was deleted.

Mayur B

10/31/2023, 10:51 AM

Could you describe them for the error?

Mayur B

10/31/2023, 10:52 AM

You are deploying it in gke?

Kalman Speier

10/31/2023, 10:53 AM

yes, gke autopilot cluster

Mayur B

10/31/2023, 10:53 AM

Just describe the pods for the error

Kalman Speier

10/31/2023, 10:53 AM

which one?

Mayur B

10/31/2023, 10:53 AM

The one thats crashing

Kalman Speier

10/31/2023, 10:54 AM

Kalman Speier

10/31/2023, 10:54 AM

and what part of the describe is interesting?

Mayur B

10/31/2023, 10:54 AM

Does your cluster has necessary IAM permissions for accessing the storage?

Mayur B

10/31/2023, 10:54 AM

The end part where events are described is fine

Kalman Speier

10/31/2023, 10:55 AM

ok. for example: Events:

Copy code

Type     Reason     Age                From                                   Message
  ----     ------     ----               ----                                   -------
  Normal   Scheduled  20m                gke.io/optimize-utilization-scheduler  Successfully assigned signoz/signoz-k8s-infra-otel-agent-drlv2 to gk3-gke-europe-west6-pool-3-2b2246ae-6n2b
  Warning  Unhealthy  19m                kubelet                                Readiness probe failed: Get "<http://10.0.65.147:13133/>": read tcp 10.0.65.129:47512->10.0.65.147:13133: read: connection reset by peer
  Warning  Unhealthy  19m                kubelet                                Liveness probe failed: Get "<http://10.0.65.147:13133/>": read tcp 10.0.65.129:47500->10.0.65.147:13133: read: connection reset by peer
  Normal   Pulled     17m (x4 over 20m)  kubelet                                Container image "docker.io/otel/opentelemetry-collector-contrib:0.79.0" already present on machine
  Normal   Created    17m (x4 over 20m)  kubelet                                Created container signoz-k8s-infra-otel-agent
  Normal   Started    17m (x4 over 20m)  kubelet                                Started container signoz-k8s-infra-otel-agent
  Warning  Unhealthy  17m                kubelet                                Readiness probe failed: Get "<http://10.0.65.147:13133/>": read tcp 10.0.65.129:33194->10.0.65.147:13133: read: connection reset by peer
  Warning  Unhealthy  17m                kubelet                                Liveness probe failed: Get "<http://10.0.65.147:13133/>": read tcp 10.0.65.129:33186->10.0.65.147:13133: read: connection reset by peer
  Warning  BackOff    3s (x79 over 19m)  kubelet                                Back-off restarting failed container signoz-k8s-infra-otel-agent in pod signoz-k8s-infra-otel-agent-drlv2_signoz(0b38256e-d87e-4475-82b5-cc822da1eb7a

Mayur B

10/31/2023, 10:56 AM

I dont see any error here. Any clues from logs?

Kalman Speier

10/31/2023, 10:56 AM

necessary IAM permissions for accessing the storage?

how can i check this?

Kalman Speier

10/31/2023, 10:57 AM

Copy code

{
  "level": "error",
  "timestamp": "2023-10-31T10:52:36.882Z",
  "caller": "client/wsclient.go:170",
  "msg": "Connection failed (dial tcp 10.0.31.235:4320: i/o timeout), will retry.",
  "component": "opamp-server-client",
  "stacktrace": "<http://github.com/open-telemetry/opamp-go/client.(*wsClient).ensureConnected|github.com/open-telemetry/opamp-go/client.(*wsClient).ensureConnected>\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:170\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runOneCycle\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:202\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runUntilStopped\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:265\ngithub.com/open-telemetry/opamp-go/client/internal.(*ClientCommon).StartConnectAndRun.func1\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/internal/clientcommon.go:197"
}

Mayur B

10/31/2023, 10:57 AM

Im not familiar with gcp, so i dont know how. Maybe you can check with your admin

Kalman Speier

10/31/2023, 10:57 AM

by storage you mean gcp storage classes?

Mayur B

10/31/2023, 10:57 AM

Yes

Kalman Speier

10/31/2023, 10:57 AM

should be ok, i have other apps installed with storage works fine

Kalman Speier

10/31/2023, 10:58 AM

i have a cockroachdb cluster with

premium-rwo

runs fine

🙌 1

Kalman Speier

10/31/2023, 10:59 AM

probably the root cause of the issue is that clickhouse not running?

Kalman Speier

10/31/2023, 10:59 AM

Copy code

❯ kubectl get statefulset
NAME                                READY   AGE
chi-signoz-clickhouse-cluster-0-0   0/1     23m
signoz-alertmanager                 0/1     25m
signoz-query-service                0/1     25m
signoz-zookeeper                    1/1     25m

Mayur B

10/31/2023, 10:59 AM

oh yes, it should be running

Mayur B

10/31/2023, 11:00 AM

Why isnt clickhouse running?

Kalman Speier

10/31/2023, 11:00 AM

good question

Kalman Speier

10/31/2023, 11:01 AM

Copy code

2023.10.31 10:58:32.965776 [ 194 ] {} <Error> MergeTreeBackgroundExecutor: Exception while executing background task {bec2cc52-3957-4964-a30a-4e8ee0cc582b::202310_1_95_19}: Code: 241. DB::Exception: Memory limit (total) exceeded: would use 501.56 MiB (attempt to allocate chunk of 4582439 bytes), maximum: 460.80 MiB. OvercommitTracker decision: Memory overcommit isn't used. Waiting time or overcommit denominator are set to zero. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):

Kalman Speier

10/31/2023, 11:01 AM

could be this?

Mayur B

10/31/2023, 11:02 AM

Whats the memory of your cluster nodes?

Kalman Speier

10/31/2023, 11:03 AM

it’s autopilot

Kalman Speier

10/31/2023, 11:04 AM

Copy code

❯ kubectl top nodes
NAME                                              CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
gk3-gke-europe-west6-nap-bkvssbza-d20a0b49-vwj4   67m          1%     1853Mi          14%
gk3-gke-europe-west6-nap-qvub459m-12f48f1a-rp6g   148m         3%     2161Mi          16%
gk3-gke-europe-west6-nap-qvub459m-95c42732-nkgq   181m         4%     5119Mi          38%
gk3-gke-europe-west6-pool-3-2b2246ae-6n2b         202m         5%     3581Mi          27%
gk3-gke-europe-west6-pool-3-7b2a27f1-gk2p         215m         5%     2079Mi          15%
gk3-gke-europe-west6-pool-3-7b2a27f1-n6v4         201m         5%     2567Mi          19%

Kalman Speier

10/31/2023, 11:04 AM

probably it’s the resource request for clickhouse in the helm values are not enough

Kalman Speier

10/31/2023, 11:06 AM

but there is no limit set in the yaml, so it should be fine. i don’t know.

Mayur B

10/31/2023, 11:06 AM

https://github.com/SigNoz/charts/blob/main/charts/signoz/values.yaml#L157 Maybe try removing the limits or scale up your nodes

Mayur B

10/31/2023, 11:06 AM

Limits are set i have shared the link

Kalman Speier

10/31/2023, 11:07 AM

limits are commented out. no ?

Mayur B

10/31/2023, 11:07 AM

Oh yea Sorry

Kalman Speier

10/31/2023, 11:09 AM

either way i try to set higher resource requests and will see..

✅ 1

Nočnica Mellifera

10/31/2023, 3:25 PM

Thanks for tagging in @Mayur B, let me know if I can send you some SigNoz stickers!

🙌 2

Siggy Gudbrandsson

12/21/2023, 12:05 PM

I fixed this by increasing the memory of clickhouse to 2gb

133 Views

Open in Slack

Previous Next