Travis Chambers
03/27/2023, 5:55 PMAnkit Nayan
03/27/2023, 6:10 PMTravis Chambers
03/27/2023, 7:16 PMsignoz-otel-collector-init wget: can't connect to remote host (172.20.64.8): Connection refused
signoz-otel-collector-init waiting for clickhouseDB
stream logs failed container "signoz-otel-collector" in pod "signoz-otel-collector-76dd66c56c-98nk5" is waiting to start: PodInitializing for signoz/signoz-otel-collector-76dd66c56c-98nk5 (signoz-otel-collector)
Ankit Nayan
03/27/2023, 7:17 PMTravis Chambers
03/27/2023, 10:04 PMsignoz-otel-collector
pod?signoz-otel-collector 2023-03-27T23:40:17.465Z error exporterhelper/queued_retry.go:310 Dropping data because sending_queue is full. Try increasing queue_size. {"kind": "exporter", "data_type": "lo │
│ signoz-otel-collector <http://go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send|go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send> │
│ signoz-otel-collector /go/pkg/mod/go.opentelemetry.io/collector@v0.66.0/exporter/exporterhelper/queued_retry.go:310 │
│ signoz-otel-collector <http://go.opentelemetry.io/collector/exporter/exporterhelper.NewLogsExporter.func2|go.opentelemetry.io/collector/exporter/exporterhelper.NewLogsExporter.func2> │
│ signoz-otel-collector /go/pkg/mod/go.opentelemetry.io/collector@v0.66.0/exporter/exporterhelper/logs.go:114 │
│ signoz-otel-collector <http://go.opentelemetry.io/collector/consumer.ConsumeLogsFunc.ConsumeLogs|go.opentelemetry.io/collector/consumer.ConsumeLogsFunc.ConsumeLogs> │
│ signoz-otel-collector /go/pkg/mod/go.opentelemetry.io/collector/consumer@v0.66.0/logs.go:36 │
│ signoz-otel-collector <http://go.opentelemetry.io/collector/processor/batchprocessor.(*batchLogs).export|go.opentelemetry.io/collector/processor/batchprocessor.(*batchLogs).export> │
│ signoz-otel-collector /go/pkg/mod/go.opentelemetry.io/collector/processor/batchprocessor@v0.66.0/batch_processor.go:339 │
│ signoz-otel-collector <http://go.opentelemetry.io/collector/processor/batchprocessor.(*batchProcessor).sendItems|go.opentelemetry.io/collector/processor/batchprocessor.(*batchProcessor).sendItems> │
│ signoz-otel-collector /go/pkg/mod/go.opentelemetry.io/collector/processor/batchprocessor@v0.66.0/batch_processor.go:176 │
│ signoz-otel-collector <http://go.opentelemetry.io/collector/processor/batchprocessor.(*batchProcessor).startProcessingCycle|go.opentelemetry.io/collector/processor/batchprocessor.(*batchProcessor).startProcessingCycle> │
│ signoz-otel-collector /go/pkg/mod/go.opentelemetry.io/collector/processor/batchprocessor@v0.66.0/batch_processor.go:144 │
│ signoz-otel-collector 2023-03-27T23:40:17.465Z warn batchprocessor@v0.66.0/batch_processor.go:178 Sender failed {"kind": "processor", "name": "batch", "pipeline": "logs", "error": "sending_queue is │
Ankit Nayan
03/29/2023, 4:06 AMSrikanth Chekuri
03/29/2023, 4:37 AMsent_log_records
and failed_log_records
?Travis Chambers
03/29/2023, 3:23 PMSrikanth Chekuri
03/29/2023, 3:26 PMSUM_RATE
of accepted_log_records
and SUM_RATE
of sent_log_records
in different panels and share the result screenshots?Travis Chambers
03/29/2023, 4:02 PMsignoz-query-service-init waiting for clickhouseDB
Srikanth Chekuri
03/29/2023, 4:09 PMTravis Chambers
03/29/2023, 4:12 PMchi-signoz-clickhouse-cluster-0-0-0
.│ clickhouse 2023.03.29 16:12:03.236724 [ 7 ] {} <Information> Application: Setting max_server_memory_usage was set to 3.60 GiB (4.00 GiB available * 0.90 max_server_memory_usage_to_ram_ratio) │
│ clickhouse 2023.03.29 16:12:03.248365 [ 7 ] {} <Information> CertificateReloader: One of paths is empty. Cannot apply new configuration for certificates. Fill all paths and try again. │
│ clickhouse 2023.03.29 16:12:03.278497 [ 7 ] {} <Information> Application: Uncompressed cache policy name │
│ clickhouse 2023.03.29 16:12:03.278524 [ 7 ] {} <Information> Application: Uncompressed cache size was lowered to 2.00 GiB because the system has low amount of memory │
│ clickhouse 2023.03.29 16:12:03.279636 [ 7 ] {} <Information> Context: Initialized background executor for merges and mutations with num_threads=16, num_tasks=32 │
│ clickhouse 2023.03.29 16:12:03.279972 [ 7 ] {} <Information> Context: Initialized background executor for move operations with num_threads=8, num_tasks=8 │
│ clickhouse 2023.03.29 16:12:03.280512 [ 7 ] {} <Information> Context: Initialized background executor for fetches with num_threads=8, num_tasks=8 │
│ clickhouse 2023.03.29 16:12:03.280890 [ 7 ] {} <Information> Context: Initialized background executor for common operations (e.g. clearing old parts) with num_threads=8, num_tasks=8 │
│ clickhouse 2023.03.29 16:12:03.281002 [ 7 ] {} <Information> Application: Mark cache size was lowered to 2.00 GiB because the system has low amount of memory │
│ clickhouse 2023.03.29 16:12:03.281075 [ 7 ] {} <Information> Application: Loading user defined objects from /var/lib/clickhouse/ │
│ clickhouse 2023.03.29 16:12:03.282445 [ 7 ] {} <Information> Application: Loading metadata from /var/lib/clickhouse/ │
│ clickhouse 2023.03.29 16:12:03.310147 [ 7 ] {} <Information> DatabaseAtomic (system): Metadata processed, database system has 6 tables and 0 dictionaries in total. │
│ clickhouse 2023.03.29 16:12:03.310171 [ 7 ] {} <Information> TablesLoader: Parsed metadata of 6 tables in 1 databases in 0.012396625 sec │
│ clickhouse 2023.03.29 16:12:03.310199 [ 7 ] {} <Information> TablesLoader: Loading 6 tables with 0 dependency level │
│ clickhouse 2023.03.29 16:12:18.565650 [ 58 ] {} <Information> TablesLoader: 16.666666666666668% │
│ clickhouse 2023.03.29 16:13:21.737596 [ 58 ] {} <Information> TablesLoader: 33.333333333333336% │
│ clickhouse 2023.03.29 16:13:31.576439 [ 8 ] {} <Information> Application: Received termination signal (Terminated) │
│ signoz-clickhouse-init + chmod +x /var/lib/clickhouse/user_scripts/histogramQuantile │
│ Stream closed EOF for signoz/chi-signoz-clickhouse-cluster-0-0-0 (signoz-clickhouse-init) │
│ Stream closed EOF for signoz/chi-signoz-clickhouse-cluster-0-0-0 (clickhouse) │
│
Srikanth Chekuri
03/29/2023, 4:37 PMPrashant Shahi
03/29/2023, 4:44 PMTravis Chambers
03/29/2023, 4:47 PMchi-signoz-clickhosue-cluster-0-0-0
, i see that it eventually crashes.
│ clickhouse 2023.03.29 16:50:17.822883 [ 7 ] {} <Information> Application: Loading user defined objects from /var/lib/clickhouse/ │
│ clickhouse 2023.03.29 16:50:17.823295 [ 7 ] {} <Information> Application: Loading metadata from /var/lib/clickhouse/ │
│ clickhouse 2023.03.29 16:50:17.831628 [ 7 ] {} <Information> DatabaseAtomic (system): Metadata processed, database system has 6 tables and 0 dictionaries in total. │
│ clickhouse 2023.03.29 16:50:17.831656 [ 7 ] {} <Information> TablesLoader: Parsed metadata of 6 tables in 1 databases in 0.003232883 sec │
│ clickhouse 2023.03.29 16:50:17.831689 [ 7 ] {} <Information> TablesLoader: Loading 6 tables with 0 dependency level │
│ clickhouse 2023.03.29 16:50:31.297963 [ 59 ] {} <Information> TablesLoader: 16.666666666666668% │
│ Stream closed EOF for signoz/chi-signoz-clickhouse-cluster-0-0-0 (signoz-clickhouse-init) │
│ clickhouse 2023.03.29 16:51:24.583409 [ 59 ] {} <Information> TablesLoader: 50% │
│ clickhouse 2023.03.29 16:51:40.424811 [ 58 ] {} <Information> TablesLoader: 66.66666666666667% │
│ clickhouse 2023.03.29 16:51:46.080805 [ 8 ] {} <Information> Application: Received termination signal (Terminated) │
│
Application: Received termination signal (Terminated)
clickhouse 2023.03.29 17:09:07.664915 [ 58 ] {} <Information> TablesLoader: 16.666666666666668% clickhouse 2023.03.29 17:09:45.751947 [ 58 ] {} <Information> TablesLoader: 33.333333333333336% clickhouse 2023.03.29 17:10:11.747418 [ 58 ] {} <Information> TablesLoader: 50% clickhouse 2023.03.29 17:10:15.038757 [ 8 ] {} <Information> Application: Received termination signal (Terminated) clickhouse 2023.03.29 17:10:22.257197 [ 60 ] {} <Information> TablesLoader: 66.66666666666667%
Prashant Shahi
03/29/2023, 5:14 PMTravis Chambers
03/29/2023, 5:16 PMPrashant Shahi
03/29/2023, 5:17 PMkubectl describe
on the CHI pod?Travis Chambers
03/29/2023, 5:18 PMEvents: │
│ Type Reason Age From Message │
│ ---- ------ ---- ---- ------- │
│ Normal Scheduled 4m35s default-scheduler Successfully assigned signoz/chi-signoz-clickhouse-cluster-0-0-0 to ip-10-0-3-214.us-west-2.compute.internal │
│ Normal Pulled 4m34s kubelet Container image "<http://docker.io/busybox:1.35|docker.io/busybox:1.35>" already present on machine │
│ Normal Created 4m34s kubelet Created container signoz-clickhouse-init │
│ Normal Started 4m34s kubelet Started container signoz-clickhouse-init │
│ Normal Pulled 4m33s kubelet Container image "<http://docker.io/clickhouse/clickhouse-server:22.8.8-alpine|docker.io/clickhouse/clickhouse-server:22.8.8-alpine>" already present on machine │
│ Normal Created 4m33s kubelet Created container clickhouse │
│ Normal Started 4m33s kubelet Started container clickhouse │
│ Warning Unhealthy 3m31s (x18 over 4m22s) kubelet Readiness probe failed: Get "<http://10.0.3.105:8123/ping>": dial tcp 10.0.3.105:8123: connect: connection refused │
│ Warning Unhealthy 3m31s kubelet Liveness probe failed: Get "<http://10.0.3.105:8123/ping>": dial tcp 10.0.3.105:8123: connect: connection refused │
│ │
│ State: Terminated │
│ Reason: Completed │
│ Exit Code: 0 │
│ Started: Wed, 29 Mar 2023 10:13:25 -0700 │
│ Finished: Wed, 29 Mar 2023 10:13:26 -0700 │
│ Ready: True
Prashant Shahi
03/29/2023, 5:26 PMTravis Chambers
03/29/2023, 5:28 PMchi-signoz-clickhouse-cluster-0-0-0
podContainers:
clickhouse:
Container ID: <containerd://450b4d4021e759611b2ff88b3bb1d11aa84bcf1c6ffefb7ff507b179f708f3d>1
Image: <http://docker.io/clickhouse/clickhouse-server:22.8.8-alpine|docker.io/clickhouse/clickhouse-server:22.8.8-alpine>
Image ID: <http://docker.io/clickhouse/clickhouse-server@sha256:c93e1e4d06df2d07a5d7cd3aed8b551373c3b2690ea074a10729ee8ba29f3fb1|docker.io/clickhouse/clickhouse-server@sha256:c93e1e4d06df2d07a5d7cd3aed8b551373c3b2690ea074a10729ee8ba29f3fb1>
Ports: 8123/TCP, 9000/TCP, 9009/TCP, 9000/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
Command:
/bin/bash
-c
/usr/bin/clickhouse-server --config-file=/etc/clickhouse-server/config.xml
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Wed, 29 Mar 2023 10:25:26 -0700
Finished: Wed, 29 Mar 2023 10:27:25 -0700
Ready: False
Restart Count: 6
Requests:
cpu: 4
memory: 8Gi
Liveness: http-get http://:http/ping delay=60s timeout=1s period=3s #success=1 #failure=10
Readiness: http-get http://:http/ping delay=10s timeout=1s period=3s #success=1 #failure=3
Prashant Shahi
03/29/2023, 5:31 PMThis confirms OOM.Exit Code: 137
Travis Chambers
03/29/2023, 5:35 PMPrashant Shahi
03/29/2023, 5:36 PMTravis Chambers
03/29/2023, 5:36 PMclickhouse.resources.requests.memory
resources:
requests:
cpu: '4'
memory: 16Gi
Prashant Shahi
03/29/2023, 5:37 PMTravis Chambers
03/29/2023, 5:39 PMsignoz-k8s-infra-otel-agent
configmap, yeah?
receivers:
filelog/k8s:
exclude:
- /var/log/pods/kube-system_*.log
- /var/log/pods/*_hotrod*_*/*/*.log
- /var/log/pods/*_locust*_*/*/*.log
include:
- /var/log/pods/*/*/*.log
Prashant Shahi
03/29/2023, 6:22 PMresources:
requests:
cpu: '1'
memory: 4Gi
limits:
cpu: '4'
memory: 16Gi
Travis Chambers
03/29/2023, 6:31 PMclickhouse 2023.03.29 18:30:07.892732 [ 216 ] <Information> executeQuery: Read 375301 rows, 2.86 MiB in 6.395357589 sec., 58683 rows/sec., 458.46 KiB/sec.
clickhouse 2023.03.29 18:30:10.799741 [ 235 ] <Information> executeQuery: Read 375301 rows, 2.86 MiB in 5.918852143 sec., 63407 rows/sec., 495.37 KiB/sec.
clickhouse 2023.03.29 18:30:13.988077 [ 11 ] <Information> executeQuery: Read 375301 rows, 2.86 MiB in 6.059014593 sec., 61940 rows/sec., 483.91 KiB/sec.
clickhouse 2023.03.29 18:30:14.038654 [ 10 ] <Information> executeQuery: Read 375301 rows, 2.86 MiB in 6.09963416 sec., 61528 rows/sec., 480.69 KiB/sec.
clickhouse 2023.03.29 18:30:18.055179 [ 229 ] <Information> executeQuery: Read 5 rows, 282.00 B in 26.759150599 sec., 0 rows/sec., 10.54 B/sec.
clickhouse 2023.03.29 18:30:18.079163 [ 235 ] <Information> executeQuery: Read 375301 rows, 2.86 MiB in 7.233857096 sec., 51881 rows/sec., 405.32 KiB/sec
requests.memory
higher it seemsclickhouse client
command doesn't work.
$ kubectl exec -n signoz -it chi-signoz-clickhouse-cluster-0-0-0 -- sh
Defaulted container "clickhouse" out of: clickhouse, signoz-clickhouse-init (init)
/ $ clickhouse client
ClickHouse client version 22.8.8.3 (official build).
Connecting to localhost:9000 as user default.
Code: 210. DB::NetException: Connection refused (localhost:9000). (NETWORK_ERROR)
Srikanth Chekuri
03/30/2023, 6:20 AMbutTrycommand doesn’t work.clickhouse client
clickhouse-client
, ideally, both should work. Make sure you are exec’ing into clickhouse-cluster not the clickhouse-operator.Travis Chambers
03/30/2023, 3:20 PM<<K9s-Shell>> Pod: signoz/chi-signoz-clickhouse-cluster-0-0-0 | Container: clickhouse
bash-5.1$ clickhouse-client
ClickHouse client version 22.8.8.3 (official build).
Connecting to localhost:9000 as user default.
Code: 210. DB::NetException: Connection refused (localhost:9000). (NETWORK_ERROR)
Srikanth Chekuri
03/30/2023, 3:35 PMTravis Chambers
03/30/2023, 3:51 PM/var/lib/clickhouse
👍/var/lib/clickhouse/data
is only 160kb./var/lib/clickhouse/store
is, because the pod OOMs before du
has time to return any info to me and i lose my shell./var/lib/clickhouse/store
dir altogether?Srikanth Chekuri
03/30/2023, 4:30 PM/store
contains the part files, but I don’t know what else goes in there? Can you delete the whole PV data just to be safe and not leave it in any corrupt state?Travis Chambers
03/30/2023, 4:51 PM/var/lib/clickhouse/
dir?Srikanth Chekuri
03/30/2023, 4:52 PMTravis Chambers
03/30/2023, 4:53 PM