<@U03S8BY37KK> Does it restart with OOM or differe...
# support
s
@Duncan Boag Does it restart with OOM or different reason? How much data gets ingested and what are the resource limits for the pod?
d
Hi Srikanth. Sorry for lack of reply. I just need to find time to look at this again to get back to you.
s
No worries, I want to understand the reason for crash to know if it's some new bug or something we have seen already happen.
p
The reason for 504 in frontend is likely because of query-service crashing and Frontend Nginx is unable to resolve the query-service address. Logs and/or state of query-service pod would have been very helpful to find out the cause.
d
Sorry for the delayed reply - I've been quite busy and also took some leave.
However, I've just noticed that your documentation states that version 1.21 of Kubernetes is a prerequisite, and our version is 1.20. Could this be my problem?
Copy code
Client Version: <http://version.Info|version.Info>{Major:"1", Minor:"21", GitVersion:"v1.21.10", GitCommit:"a7a32748b5c60445c4c7ee904caf01b91f2dbb71", GitTreeState:"clean", BuildDate:"2022-02-16T11:24:04Z", GoVersion:"go1.16.14", Compiler:"gc", Platform:"linux/amd64"}
Server Version: <http://version.Info|version.Info>{Major:"1", Minor:"20", GitVersion:"v1.20.15", GitCommit:"8f1e5bf0b9729a899b8df86249b56e2c74aebc55", GitTreeState:"clean", BuildDate:"2022-01-19T17:23:01Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
s
Can you get the exit status code for the container when it crashed?
Last
exitCode
was 137, but I'm not sure if this was me forcing a container restart. I'll have to wait and see if the container craches again and report back.
This is mostly likely to force the container restart.
Note, when I try to log in I do see a Login message in the logs for the Query Service pod - it just seems to not return anything to the browser.
Is login not working?
d
Frontend logs show:
Copy code
2022/08/11 09:29:54 [error] 8#8: *16749 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 10.204.124.0, server: _, request: "POST /api/v1/login HTTP/1.1", upstream: "<http://10.204.151.248:8080/api/v1/login>", host: "10.204.246.239:3301", referrer: "<http://10.204.246.239:3301/login>"
2022/08/11 09:29:54 [error] 8#8: *16749 open() "/usr/share/nginx/html/50x.html" failed (2: No such file or directory), client: 10.204.124.0, server: _, request: "POST /api/v1/login HTTP/1.1", upstream: "<http://10.204.151.248:8080/api/v1/login>", host: "10.204.246.239:3301", referrer: "<http://10.204.246.239:3301/login>"
10.204.124.0 - - [11/Aug/2022:09:29:54 +0000] "POST /api/v1/login HTTP/1.1" 404 185 "<http://10.204.246.239:3301/login>" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36" "-"
p
@Duncan Boag can you share output of the following?
Copy code
kubectl logs -n platform -f pod/my-release-signoz-query-service-0 --previous
d
Copy code
2022-08-10T10:43:13.733Z        INFO    version/version.go:43

SigNoz version   : v0.10.1
Commit SHA-1     : 04cf1b2
Commit timestamp : 2022-08-07T09:58:45Z
Branch           : HEAD
Go version       : go1.17.13

For SigNoz Official Documentation,  visit <https://signoz.io/docs>
For SigNoz Community Slack,         visit <http://signoz.io/slack>
For discussions about SigNoz,       visit <https://community.signoz.io>

Licensed under the MIT License.
Copyright 2022 SigNoz


2022-08-10T10:43:13.733Z        WARN    query-service/main.go:61        No JWT secret key is specified.
main.main
        /go/src/github.com/signoz/signoz/pkg/query-service/main.go:61
runtime.main
        /usr/local/go/src/runtime/proc.go:255
p
That's the usual warning, which does not affect anything.
Exit Code 137 is common when
OOMKilled
what's the K8s cluster resource look like? Node type and size?
d
The pods are spread over 3 nodes. Is there a specific pod you're interested in for this?
NB: all those nodes have 4 CPUs and 8GB memory.
p
That should be able to handle good load. How many services instrumented?RPS? what about metrics or alerts?
Also, are you using default Helm chart configurations? if not, could you please share your
override-values.yaml
?
d
I'll try and get the above info to you soon. Interestingly, when I'm having the "login problem", and I try to POST to the login API with no date, it returns immediately:
Copy code
# curl -X POST <http://10.204.151.248:8080/api/v1/login>
{"data":null,"total":0,"limit":0,"offset":0,"errors":[{"code":400,"msg":"EOF"}]}
But if I call it with login info, it does not return:
Copy code
# curl -X POST -d '{"email":"<mailto:duncan.boag@telviva.co.za|duncan.boag@telviva.co.za>","password":"password_here"}' -H "Content-Type: application/json"  <http://10.204.151.248:8080/api/v1/login>
(no response from above call).
p
@Duncan Boag The above response is expected when query-service is down.
@Duncan Boag you could increase memory limit of query service for now to resolve it. override-values.yaml
Copy code
queryService:
  resources:
    limits:
      cpu: 750m
      memory: 2000Mi