sudhanshu dev
09/20/2022, 11:14 AMSrikanth Chekuri
09/20/2022, 11:34 AMsudhanshu dev
09/20/2022, 11:34 AMnitya-signoz
09/20/2022, 11:54 AMsudhanshu dev
09/20/2022, 11:54 AMSrikanth Chekuri
09/20/2022, 12:02 PMsudhanshu dev
09/20/2022, 12:03 PMName: signoz-release-query-service-0
Namespace: platform
Priority: 0
Node: ip-10-107-65-6.ap-south-1.compute.internal/10.107.65.6
Start Time: Tue, 20 Sep 2022 19:51:45 +0530
Labels: <http://app.kubernetes.io/component=query-service|app.kubernetes.io/component=query-service>
<http://app.kubernetes.io/instance=signoz-release|app.kubernetes.io/instance=signoz-release>
<http://app.kubernetes.io/name=signoz|app.kubernetes.io/name=signoz>
controller-revision-hash=signoz-release-query-service-6b74848544
<http://statefulset.kubernetes.io/pod-name=signoz-release-query-service-0|statefulset.kubernetes.io/pod-name=signoz-release-query-service-0>
Annotations: checksum/config: 04f4266ae5775a09aa16c23105aec568e83d8e15a04f4d5588eeac26b5bc74e4
<http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
Status: Running
IP: 10.107.93.107
IPs:
IP: 10.107.93.107
Controlled By: StatefulSet/signoz-release-query-service
Init Containers:
signoz-release-query-service-init:
Container ID: <docker://ecf7c957bcc41dc3de24949a9a40e46b1b9284460b924cef9f4a467a1398003>d
Image: <http://docker.io/busybox:1.35|docker.io/busybox:1.35>
Image ID: <docker-pullable://busybox@sha256:09439>c073bd3eb029a91c72eff2c0d9f12ab9c84f66bdef360fcf3f91a81bf7c
Port: <none>
Host Port: <none>
Command:
sh
-c
until wget --spider -q signoz-release-clickhouse:8123/ping; do echo -e "waiting for clickhouseDB"; sleep 5; done; echo -e "clickhouse ready, starting query service now";
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 20 Sep 2022 19:51:55 +0530
Finished: Tue, 20 Sep 2022 19:51:55 +0530
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hk7nl (ro)
Containers:
signoz-release-query-service:
Container ID: <docker://898705709a98a98b39673b364418bec50048e0a454514fab1f7f6b1031559e6>e
Image: <http://docker.io/signoz/query-service:0.11.0|docker.io/signoz/query-service:0.11.0>
Image ID: <docker-pullable://signoz/query-service@sha256:fbaba7b20e60dfa2cc55a456afdf13bcc94f17b99afab94969250cf1b35bc6dd>
Port: 8080/TCP
Host Port: 0/TCP
Args:
-config=/root/config/prometheus.yml
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Tue, 20 Sep 2022 19:57:03 +0530
Finished: Tue, 20 Sep 2022 19:57:28 +0530
Ready: False
Restart Count: 5
Limits:
cpu: 750m
memory: 1000Mi
Requests:
cpu: 200m
memory: 300Mi
Liveness: http-get http://:http/api/v1/version delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/api/v1/version delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
STORAGE: clickhouse
ClickHouseUrl: <tcp://signoz-release-clickhouse:9000?database=signoz_traces&username=admin&password=27ff0399-0d3a-4bd8-919d-17c2181e6fb9>
ALERTMANAGER_API_PREFIX: <http://signoz-release-alertmanager:9093/api/>
GODEBUG: netdns=go
TELEMETRY_ENABLED: true
DEPLOYMENT_TYPE: kubernetes-helm
Mounts:
/root/config from prometheus (rw)
/root/config/dashboards from dashboards (rw)
/var/lib/signoz/ from signoz-db (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hk7nl (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
signoz-db:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: signoz-db-signoz-release-query-service-0
ReadOnly: false
prometheus:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: signoz-release-query-service
Optional: false
dashboards:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-hk7nl:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
<http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m59s default-scheduler Successfully assigned platform/signoz-release-query-service-0 to ip-10-107-65-6.ap-south-1.compute.internal
Normal Pulled 5m49s kubelet Container image "<http://docker.io/busybox:1.35|docker.io/busybox:1.35>" already present on machine
Normal Created 5m49s kubelet Created container signoz-release-query-service-init
Normal Started 5m49s kubelet Started container signoz-release-query-service-init
Warning Unhealthy 4m10s kubelet Liveness probe failed: Get "<http://10.107.93.107:8080/api/v1/version>": read tcp 10.107.65.6:45490->10.107.93.107:8080: read: connection reset by peer
Warning Unhealthy 4m10s kubelet Readiness probe failed: Get "<http://10.107.93.107:8080/api/v1/version>": read tcp 10.107.65.6:45488->10.107.93.107:8080: read: connection reset by peer
Warning BackOff 4m (x6 over 4m53s) kubelet Back-off restarting failed container
Normal Created 3m46s (x4 over 5m49s) kubelet Created container signoz-release-query-service
Normal Started 3m46s (x4 over 5m49s) kubelet Started container signoz-release-query-service
Warning Unhealthy 3m45s (x2 over 4m36s) kubelet Readiness probe failed: Get "<http://10.107.93.107:8080/api/v1/version>": dial tcp 10.107.93.107:8080: connect: connection refused
Normal Pulled 41s (x6 over 5m49s) kubelet Container image "<http://docker.io/signoz/query-service:0.11.0|docker.io/signoz/query-service:0.11.0>" already present on machine
2022-09-20T14:27:03.866Z INFO version/version.go:43
SigNoz version : v0.11.0
Commit SHA-1 : 73b00f4
Commit timestamp : 2022-08-24T13:32:19Z
Branch : HEAD
Go version : go1.17.13
For SigNoz Official Documentation, visit <https://signoz.io/docs>
For SigNoz Community Slack, visit <http://signoz.io/slack>
For discussions about SigNoz, visit <https://community.signoz.io>
Licensed under the MIT License.
Copyright 2022 SigNoz
2022-09-20T14:27:03.867Z WARN query-service/main.go:61 No JWT secret key is specified.
main.main
/go/src/github.com/signoz/signoz/pkg/query-service/main.go:61
runtime.main
/usr/local/go/src/runtime/proc.go:255
2022-09-20T14:27:04.532Z INFO app/server.go:84 Using ClickHouse as datastore ...
ts=2022-09-20T14:27:04.540122131Z caller=log.go:168 level=info msg="Loading configuration file" filename=/root/config/prometheus.yml
2022-09-20T14:27:04.543Z INFO alertManager/notifier.go:94 Starting notifier with alert manager:[<http://signoz-release-alertmanager:9093/api/>]
2022-09-20T14:27:04.543Z INFO app/server.go:396 rules manager is ready
2022-09-20T14:27:04.551Z DEBUG rules/apiParams.go:83 postable rule(parsed):%!(EXTRA *rules.PostableRule=&{index cpu utilisation threshold_rule 300000000000 0 {"compositeMetricQuery":{"builderQueries":{"A":{"queryName":"A","metricName":"container_cpu_utilization","tagFilters":{"op":"AND","items":[{"key":"k8s_namespace_name","value":["orange"],"op":"LIKE"}]},"aggregateOperator":5,"expression":"A","disabled":false}},"promQueries":{"A":{"query":"k","disabled":false}},"panelType":0,"queryType":1},"op":"1","target":0.001,"matchType":"1"} map[severity:warning] map[description:A new alert] false <https://observability-dash-14e0a46b923382883464f0a5c53159a8.fnpaas.com/alerts/edit?ruleId=1> [] })
2022-09-20T14:27:04.551Z DEBUG rules/apiParams.go:124 postable rule:%!(EXTRA *rules.PostableRule=&{index cpu utilisation threshold_rule 300000000000 60000000000 {"compositeMetricQuery":{"builderQueries":{"A":{"queryName":"A","metricName":"container_cpu_utilization","tagFilters":{"op":"AND","items":[{"key":"k8s_namespace_name","value":["orange"],"op":"LIKE"}]},"aggregateOperator":5,"expression":"A","disabled":false}},"promQueries":{"A":{"query":"k","disabled":false}},"panelType":0,"queryType":1},"op":"1","target":0.001,"matchType":"1"} map[severity:warning] map[description:A new alert] false <https://observability-dash-14e0a46b923382883464f0a5c53159a8.fnpaas.com/alerts/edit?ruleId=1> [] }, string= condition, string={"compositeMetricQuery":{"builderQueries":{"A":{"queryName":"A","metricName":"container_cpu_utilization","tagFilters":{"op":"AND","items":[{"key":"k8s_namespace_name","value":["orange"],"op":"LIKE"}]},"aggregateOperator":5,"expression":"A","disabled":false}},"promQueries":{"A":{"query":"k","disabled":false}},"panelType":0,"queryType":1},"op":"1","target":0.001,"matchType":"1"})
2022-09-20T14:27:04.551Z DEBUG rules/manager.go:344 msg:%!(EXTRA string=adding a new rule task, string= task name:, string=1-groupname)
2022-09-20T14:27:04.552Z INFO rules/thresholdRule.go:91 msg:creating new alerting rule name:index cpu utilisation condition:{"compositeMetricQuery":{"builderQueries":{"A":{"queryName":"A","metricName":"container_cpu_utilization","tagFilters":{"op":"AND","items":[{"key":"k8s_namespace_name","value":["orange"],"op":"LIKE"}]},"aggregateOperator":5,"expression":"A","disabled":false}},"promQueries":{"A":{"query":"k","disabled":false}},"panelType":0,"queryType":1},"op":"1","target":0.001,"matchType":"1"} generatorURL:<https://observability-dash-14e0a46b923382883464f0a5c53159a8.fnpaas.com/alerts/edit?ruleId=1>
2022-09-20T14:27:04.552Z INFO rules/ruleTask.go:44 msg:initiating a new rule task name:1-groupname frequency:1m0s
2022-09-20T14:27:04.552Z INFO app/server.go:273 Query server started listening on 0.0.0.0:8080...
starting private http
2022-09-20T14:27:04.552Z INFO app/server.go:286 Query server started listening on private port 0.0.0.0:8085...
2022-09-20T14:27:04.553Z INFO alertManager/notifier.go:126 msg: Initiating alert notifier...
2022-09-20T14:27:04.553Z INFO app/server.go:312 Starting HTTP server{port 11 8080 <nil>} {addr 15 0 0.0.0.0:8080 <nil>}
2022-09-20T14:27:04.553Z INFO app/server.go:324 Starting pprof server{addr 15 0 0.0.0.0:6060 <nil>}
2022-09-20T14:27:04.553Z INFO app/server.go:338 Starting Private HTTP server{port 11 8085 <nil>} {addr 15 0 0.0.0.0:8085 <nil>}
2022-09-20T14:27:04.553Z DEBUG rules/ruleTask.go:93 group:%!(EXTRA string=1-groupname, string= group run to begin at: , time.Time=2022-09-20 14:27:23.708306749 +0000 UTC)
ts=2022-09-20T14:27:04.556631051Z caller=log.go:168 level=info msg="Completed loading of configuration file" filename=/root/config/prometheus.yml
2022-09-20T14:27:04.703Z INFO app/server.go:189 /api/v1/version timeTaken: 17.861µs
2022-09-20T14:27:04.726Z INFO app/server.go:189 /api/v1/version timeTaken: 16.56µs
2022-09-20T14:27:14.694Z INFO app/server.go:189 /api/v1/version timeTaken: 16.661µs
2022-09-20T14:27:14.714Z INFO app/server.go:189 /api/v1/version timeTaken: 15.18µs
2022-09-20T14:27:23.714Z DEBUG rules/ruleTask.go:296 msg:%!(EXTRA string=rule task eval started, string= name:, string=1-groupname, string= start time:, time.Time=2022-09-20 14:27:23.708306749 +0000 UTC)
2022-09-20T14:27:23.714Z DEBUG rules/thresholdRule.go:515 ruleid:%!(EXTRA string=1, string= runQueries:, map[string]string=map[A:SELECT toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), INTERVAL 30 SECOND) as ts, avg(value) as value FROM signoz_metrics.samples_v2 INNER JOIN (SELECT fingerprint FROM signoz_metrics.time_series_v2 WHERE metric_name = 'container_cpu_utilization' AND like(labels_object.k8s_namespace_name, 'orange')) as filtered_time_series USING fingerprint WHERE metric_name = 'container_cpu_utilization' AND timestamp_ms >= 1663683743708 AND timestamp_ms <= 1663684043708 GROUP BY ts ORDER BY ts])
2022-09-20T14:27:23.714Z DEBUG rules/thresholdRule.go:533 ruleId: %!(EXTRA string=1, string= result query label:, string=A)
2022-09-20T14:27:23.804Z DEBUG rules/thresholdRule.go:488 ruleid:%!(EXTRA string=1, string= resultmap(potential alerts):, int=1)
2022-09-20T14:27:23.804Z DEBUG rules/thresholdRule.go:497 ruleid:%!(EXTRA string=1, string= result (found alerts):, int=1)
2022-09-20T14:27:23.804Z INFO rules/thresholdRule.go:630 rule:index cpu utilisation alerts found: 1
2022-09-20T14:27:23.804Z INFO rules/thresholdRule.go:291 msg:sending alerts rule:index cpu utilisation
2022-09-20T14:27:24.694Z INFO app/server.go:189 /api/v1/version timeTaken: 24.41µs
2022-09-20T14:27:24.695Z INFO app/server.go:189 /api/v1/version timeTaken: 9.95µs
nitya-signoz
09/20/2022, 3:47 PMOOMKilled
as the reason for termination, we will have to increase the limits. cc @Ankit Nayan @Srikanth ChekuriSrikanth Chekuri
09/20/2022, 4:59 PMselect count() from signoz_metrics.time_series_v2;
so we can give some rough estimate of how much RAM is needed for query service to not crash?sudhanshu dev
09/21/2022, 4:30 AMAnkit Nayan
09/21/2022, 5:06 AMsudhanshu dev
09/21/2022, 5:07 AMAnkit Nayan
09/21/2022, 5:07 AMsudhanshu dev
09/21/2022, 5:07 AMAnkit Nayan
09/21/2022, 5:10 AM0.11.0
, we have raised a fix to reduce memory usage of query-service in v0.11.1
. Can you give it a try?sudhanshu dev
09/21/2022, 5:10 AMAnkit Nayan
09/21/2022, 5:10 AMsudhanshu dev
09/21/2022, 5:10 AMAnkit Nayan
09/21/2022, 5:18 AMsudhanshu dev
09/21/2022, 5:20 AMPrashant Shahi
09/21/2022, 5:25 AM6060
of query-service container:
kubectl -n platform port-forward pod/my-release-signoz-query-service-0 6060:6060
In another terminal, run the following to obtain pprof data:
• CPU Profile
curl "<http://localhost:6060/debug/pprof/profile?seconds=30>" -o query-service.pprof -v
• Heap Profile
curl "<http://localhost:6060/debug/pprof/heap>" -o query-service-heap.pprof -v
sudhanshu dev
09/21/2022, 5:26 AM