< Duncan Boag> Does it restart with OOM or different reason SigNoz Community #support

<@U03S8BY37KK> Does it restart with OOM or differe...

Srikanth Chekuri

08/02/2022, 12:48 AM

@Duncan Boag Does it restart with OOM or different reason? How much data gets ingested and what are the resource limits for the pod?

Duncan Boag

08/03/2022, 6:16 AM

Hi Srikanth. Sorry for lack of reply. I just need to find time to look at this again to get back to you.

Srikanth Chekuri

08/03/2022, 6:18 AM

No worries, I want to understand the reason for crash to know if it's some new bug or something we have seen already happen.

Prashant Shahi

08/03/2022, 6:48 PM

The reason for 504 in frontend is likely because of query-service crashing and Frontend Nginx is unable to resolve the query-service address. Logs and/or state of query-service pod would have been very helpful to find out the cause.

Duncan Boag

08/11/2022, 8:32 AM

Sorry for the delayed reply - I've been quite busy and also took some leave.

Duncan Boag

08/11/2022, 8:34 AM

However, I've just noticed that your documentation states that version 1.21 of Kubernetes is a prerequisite, and our version is 1.20. Could this be my problem?

Copy code

Client Version: <http://version.Info|version.Info>{Major:"1", Minor:"21", GitVersion:"v1.21.10", GitCommit:"a7a32748b5c60445c4c7ee904caf01b91f2dbb71", GitTreeState:"clean", BuildDate:"2022-02-16T11:24:04Z", GoVersion:"go1.16.14", Compiler:"gc", Platform:"linux/amd64"}
Server Version: <http://version.Info|version.Info>{Major:"1", Minor:"20", GitVersion:"v1.20.15", GitCommit:"8f1e5bf0b9729a899b8df86249b56e2c74aebc55", GitTreeState:"clean", BuildDate:"2022-01-19T17:23:01Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}

Srikanth Chekuri

08/11/2022, 9:05 AM

Can you get the exit status code for the container when it crashed?

Srikanth Chekuri

08/11/2022, 9:27 AM

Last
exitCode
was 137, but I'm not sure if this was me forcing a container restart. I'll have to wait and see if the container craches again and report back.

This is mostly likely to force the container restart.

Note, when I try to log in I do see a Login message in the logs for the Query Service pod - it just seems to not return anything to the browser.

Is login not working?

Duncan Boag

08/11/2022, 9:34 AM

Frontend logs show:

Copy code

2022/08/11 09:29:54 [error] 8#8: *16749 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 10.204.124.0, server: _, request: "POST /api/v1/login HTTP/1.1", upstream: "<http://10.204.151.248:8080/api/v1/login>", host: "10.204.246.239:3301", referrer: "<http://10.204.246.239:3301/login>"
2022/08/11 09:29:54 [error] 8#8: *16749 open() "/usr/share/nginx/html/50x.html" failed (2: No such file or directory), client: 10.204.124.0, server: _, request: "POST /api/v1/login HTTP/1.1", upstream: "<http://10.204.151.248:8080/api/v1/login>", host: "10.204.246.239:3301", referrer: "<http://10.204.246.239:3301/login>"
10.204.124.0 - - [11/Aug/2022:09:29:54 +0000] "POST /api/v1/login HTTP/1.1" 404 185 "<http://10.204.246.239:3301/login>" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36" "-"

Prashant Shahi

08/11/2022, 9:42 AM

@Duncan Boag can you share output of the following?

Copy code

kubectl logs -n platform -f pod/my-release-signoz-query-service-0 --previous

Duncan Boag

08/11/2022, 9:43 AM

Copy code

2022-08-10T10:43:13.733Z        INFO    version/version.go:43

SigNoz version   : v0.10.1
Commit SHA-1     : 04cf1b2
Commit timestamp : 2022-08-07T09:58:45Z
Branch           : HEAD
Go version       : go1.17.13

For SigNoz Official Documentation,  visit <https://signoz.io/docs>
For SigNoz Community Slack,         visit <http://signoz.io/slack>
For discussions about SigNoz,       visit <https://community.signoz.io>

Licensed under the MIT License.
Copyright 2022 SigNoz


2022-08-10T10:43:13.733Z        WARN    query-service/main.go:61        No JWT secret key is specified.
main.main
        /go/src/github.com/signoz/signoz/pkg/query-service/main.go:61
runtime.main
        /usr/local/go/src/runtime/proc.go:255

Prashant Shahi

08/11/2022, 9:44 AM

That's the usual warning, which does not affect anything.

Prashant Shahi

08/11/2022, 9:45 AM

Exit Code 137 is common when

OOMKilled

Prashant Shahi

08/11/2022, 9:45 AM

what's the K8s cluster resource look like? Node type and size?

Duncan Boag

08/11/2022, 9:48 AM

The pods are spread over 3 nodes. Is there a specific pod you're interested in for this?

Duncan Boag

08/11/2022, 9:49 AM

NB: all those nodes have 4 CPUs and 8GB memory.

Prashant Shahi

08/11/2022, 9:55 AM

That should be able to handle good load. How many services instrumented?RPS? what about metrics or alerts?

Prashant Shahi

08/11/2022, 9:58 AM

Also, are you using default Helm chart configurations? if not, could you please share your

override-values.yaml

Duncan Boag

08/11/2022, 10:42 AM

I'll try and get the above info to you soon. Interestingly, when I'm having the "login problem", and I try to POST to the login API with no date, it returns immediately:

Copy code

# curl -X POST <http://10.204.151.248:8080/api/v1/login>
{"data":null,"total":0,"limit":0,"offset":0,"errors":[{"code":400,"msg":"EOF"}]}

But if I call it with login info, it does not return:

Copy code

# curl -X POST -d '{"email":"<mailto:duncan.boag@telviva.co.za|duncan.boag@telviva.co.za>","password":"password_here"}' -H "Content-Type: application/json"  <http://10.204.151.248:8080/api/v1/login>

(no response from above call).

Prashant Shahi

08/11/2022, 12:26 PM

@Duncan Boag The above response is expected when query-service is down.

Prashant Shahi

08/11/2022, 1:02 PM

@Duncan Boag you could increase memory limit of query service for now to resolve it. override-values.yaml

Copy code

queryService:
  resources:
    limits:
      cpu: 750m
      memory: 2000Mi

4 Views

Open in Slack

Previous Next