https://signoz.io logo
#support
Title
# support
s

Srikanth Chekuri

08/02/2022, 12:48 AM
@Duncan Boag Does it restart with OOM or different reason? How much data gets ingested and what are the resource limits for the pod?
d

Duncan Boag

08/03/2022, 6:16 AM
Hi Srikanth. Sorry for lack of reply. I just need to find time to look at this again to get back to you.
s

Srikanth Chekuri

08/03/2022, 6:18 AM
No worries, I want to understand the reason for crash to know if it's some new bug or something we have seen already happen.
p

Prashant Shahi

08/03/2022, 6:48 PM
The reason for 504 in frontend is likely because of query-service crashing and Frontend Nginx is unable to resolve the query-service address. Logs and/or state of query-service pod would have been very helpful to find out the cause.
d

Duncan Boag

08/11/2022, 8:32 AM
Sorry for the delayed reply - I've been quite busy and also took some leave.
However, I've just noticed that your documentation states that version 1.21 of Kubernetes is a prerequisite, and our version is 1.20. Could this be my problem?
Copy code
Client Version: <http://version.Info|version.Info>{Major:"1", Minor:"21", GitVersion:"v1.21.10", GitCommit:"a7a32748b5c60445c4c7ee904caf01b91f2dbb71", GitTreeState:"clean", BuildDate:"2022-02-16T11:24:04Z", GoVersion:"go1.16.14", Compiler:"gc", Platform:"linux/amd64"}
Server Version: <http://version.Info|version.Info>{Major:"1", Minor:"20", GitVersion:"v1.20.15", GitCommit:"8f1e5bf0b9729a899b8df86249b56e2c74aebc55", GitTreeState:"clean", BuildDate:"2022-01-19T17:23:01Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
s

Srikanth Chekuri

08/11/2022, 9:05 AM
Can you get the exit status code for the container when it crashed?
Last
exitCode
was 137, but I'm not sure if this was me forcing a container restart. I'll have to wait and see if the container craches again and report back.
This is mostly likely to force the container restart.
Note, when I try to log in I do see a Login message in the logs for the Query Service pod - it just seems to not return anything to the browser.
Is login not working?
d

Duncan Boag

08/11/2022, 9:34 AM
Frontend logs show:
Copy code
2022/08/11 09:29:54 [error] 8#8: *16749 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 10.204.124.0, server: _, request: "POST /api/v1/login HTTP/1.1", upstream: "<http://10.204.151.248:8080/api/v1/login>", host: "10.204.246.239:3301", referrer: "<http://10.204.246.239:3301/login>"
2022/08/11 09:29:54 [error] 8#8: *16749 open() "/usr/share/nginx/html/50x.html" failed (2: No such file or directory), client: 10.204.124.0, server: _, request: "POST /api/v1/login HTTP/1.1", upstream: "<http://10.204.151.248:8080/api/v1/login>", host: "10.204.246.239:3301", referrer: "<http://10.204.246.239:3301/login>"
10.204.124.0 - - [11/Aug/2022:09:29:54 +0000] "POST /api/v1/login HTTP/1.1" 404 185 "<http://10.204.246.239:3301/login>" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36" "-"
p

Prashant Shahi

08/11/2022, 9:42 AM
@Duncan Boag can you share output of the following?
Copy code
kubectl logs -n platform -f pod/my-release-signoz-query-service-0 --previous
d

Duncan Boag

08/11/2022, 9:43 AM
Copy code
2022-08-10T10:43:13.733Z        INFO    version/version.go:43

SigNoz version   : v0.10.1
Commit SHA-1     : 04cf1b2
Commit timestamp : 2022-08-07T09:58:45Z
Branch           : HEAD
Go version       : go1.17.13

For SigNoz Official Documentation,  visit <https://signoz.io/docs>
For SigNoz Community Slack,         visit <http://signoz.io/slack>
For discussions about SigNoz,       visit <https://community.signoz.io>

Licensed under the MIT License.
Copyright 2022 SigNoz


2022-08-10T10:43:13.733Z        WARN    query-service/main.go:61        No JWT secret key is specified.
main.main
        /go/src/github.com/signoz/signoz/pkg/query-service/main.go:61
runtime.main
        /usr/local/go/src/runtime/proc.go:255
p

Prashant Shahi

08/11/2022, 9:44 AM
That's the usual warning, which does not affect anything.
Exit Code 137 is common when
OOMKilled
what's the K8s cluster resource look like? Node type and size?
d

Duncan Boag

08/11/2022, 9:48 AM
The pods are spread over 3 nodes. Is there a specific pod you're interested in for this?
NB: all those nodes have 4 CPUs and 8GB memory.
p

Prashant Shahi

08/11/2022, 9:55 AM
That should be able to handle good load. How many services instrumented?RPS? what about metrics or alerts?
Also, are you using default Helm chart configurations? if not, could you please share your
override-values.yaml
?
d

Duncan Boag

08/11/2022, 10:42 AM
I'll try and get the above info to you soon. Interestingly, when I'm having the "login problem", and I try to POST to the login API with no date, it returns immediately:
Copy code
# curl -X POST <http://10.204.151.248:8080/api/v1/login>
{"data":null,"total":0,"limit":0,"offset":0,"errors":[{"code":400,"msg":"EOF"}]}
But if I call it with login info, it does not return:
Copy code
# curl -X POST -d '{"email":"<mailto:duncan.boag@telviva.co.za|duncan.boag@telviva.co.za>","password":"password_here"}' -H "Content-Type: application/json"  <http://10.204.151.248:8080/api/v1/login>
(no response from above call).
p

Prashant Shahi

08/11/2022, 12:26 PM
@Duncan Boag The above response is expected when query-service is down.
@Duncan Boag you could increase memory limit of query service for now to resolve it. override-values.yaml
Copy code
queryService:
  resources:
    limits:
      cpu: 750m
      memory: 2000Mi
4 Views