Slackbot
12/28/2022, 6:06 PMAnkit Nayan
Bill Cavalieri
12/28/2022, 6:16 PMAnkit Nayan
v0.13.0
is out which has the above fix. Let us know how it goes https://github.com/SigNoz/signoz/releases/tag/v0.13.0Bill Cavalieri
12/29/2022, 7:51 PMBill Cavalieri
12/29/2022, 8:21 PMBill Cavalieri
12/29/2022, 10:18 PMAnkit Nayan
Ankit Nayan
Attribute cardinality
from clickhouse-setup_otel-collector_1
container?https://github.com/SigNoz/signoz-otel-collector/blob/main/processor/signozspanmetricsprocessor/processor.go#L417Srikanth Chekuri
12/30/2022, 9:19 AMsignoz/signoz-otel-collector
image would be v0.66.1
) and let us know if you still face the issue.Bill Cavalieri
12/30/2022, 5:20 PMBill Cavalieri
12/30/2022, 5:21 PMSrikanth Chekuri
12/30/2022, 5:22 PMBill Cavalieri
12/30/2022, 5:23 PMSrikanth Chekuri
12/30/2022, 5:53 PM<http://github.com/SigNoz/signoz-otel-collector/exporter/clickhousetracesexporter.(*SpanWriter).writeBatch|github.com/SigNoz/signoz-otel-collector/exporter/clickhousetracesexporter.(*SpanWriter).writeBatch>
/src/exporter/clickhousetracesexporter/writer.go:129
<http://github.com/SigNoz/signoz-otel-collector/exporter/clickhousetracesexporter.(*SpanWriter).backgroundWriter|github.com/SigNoz/signoz-otel-collector/exporter/clickhousetracesexporter.(*SpanWriter).backgroundWriter>
/src/exporter/clickhousetracesexporter/writer.go:108
2022-12-30T16:54:32.516Z error clickhousetracesexporter/writer.go:109 Could not write a batch of spans {"kind": "exporter", "data_type": "traces", "name": "clickhousetraces", "error": "dial tcp 172.27.0.2:9000: connect: connection refused"}
<http://github.com/SigNoz/signoz-otel-collector/exporter/clickhousetracesexporter.(*SpanWriter).backgroundWriter|github.com/SigNoz/signoz-otel-collector/exporter/clickhousetracesexporter.(*SpanWriter).backgroundWriter>
What’s surprising is only the traces exporter had the connection error.Bill Cavalieri
12/30/2022, 5:56 PMVishal Sharma
01/02/2023, 11:04 AMAnkit Nayan
Srikanth Chekuri
01/02/2023, 11:33 AMSrikanth Chekuri
01/02/2023, 11:46 AMBill Cavalieri
01/03/2023, 3:10 PM2023-01-02T15:44:59.119Z error prometheusexporter@v0.66.0/log.go:34 error encoding and sending metric family: write tcp 172.27.0.4:8889->172.27.0.7:49386: write: broken pipe
{"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
<http://github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*promLogger).Println|github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*promLogger).Println>
/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.66.0/log.go:34
<http://github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1.2|github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1.2>
/go/pkg/mod/github.com/prometheus/client_golang@v1.14.0/prometheus/promhttp/http.go:187
<http://github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1|github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1>
/go/pkg/mod/github.com/prometheus/client_golang@v1.14.0/prometheus/promhttp/http.go:205
net/http.HandlerFunc.ServeHTTP
/usr/local/go/src/net/http/server.go:2084
net/http.(*ServeMux).ServeHTTP
/usr/local/go/src/net/http/server.go:2462
<http://go.opentelemetry.io/collector/config/confighttp.(*decompressor).wrap.func1|go.opentelemetry.io/collector/config/confighttp.(*decompressor).wrap.func1>
/go/pkg/mod/go.opentelemetry.io/collector@v0.66.0/config/confighttp/compression.go:162
net/http.HandlerFunc.ServeHTTP
/usr/local/go/src/net/http/server.go:2084
<http://go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*Handler).ServeHTTP|go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*Handler).ServeHTTP>
/go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.36.4/handler.go:204
<http://go.opentelemetry.io/collector/config/confighttp.(*clientInfoHandler).ServeHTTP|go.opentelemetry.io/collector/config/confighttp.(*clientInfoHandler).ServeHTTP>
/go/pkg/mod/go.opentelemetry.io/collector@v0.66.0/config/confighttp/clientinfohandler.go:39
net/http.serverHandler.ServeHTTP
/usr/local/go/src/net/http/server.go:2916
net/http.(*conn).serve
/usr/local/go/src/net/http/server.go:1966
Bill Cavalieri
01/03/2023, 3:11 PMVishal Sharma
01/03/2023, 3:12 PMSrikanth Chekuri
01/03/2023, 3:12 PMBill Cavalieri
01/03/2023, 3:18 PM2023-01-02T16:35:54.865Z info exporterhelper/queued_retry.go:426 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "error": "PrepareBatch:read tcp 172.27.0.4:34884->172.27.0.2:9000: i/o timeout", "interval": "6.421334743s"}
Srikanth Chekuri
01/03/2023, 3:21 PMthe 172.27.0.4:8889 write: broken pipe
is unrelated issue to other ones. It is used for the APM metrics. It could be that prom exporter is terminating the connection before they get fully scraped. How often did you notice this error?Srikanth Chekuri
01/03/2023, 3:23 PMHow often did you notice this? You ClickHouse is probably busy at the time and didn’t complete the request.Copy coderead tcp 172.27.0.4:34884->172.27.0.2:9000: i/o timeout
Bill Cavalieri
01/03/2023, 3:24 PMBill Cavalieri
01/03/2023, 3:25 PMSrikanth Chekuri
01/03/2023, 3:26 PMEvery 30-60 minutes this happens during the day.Does a broken pipe error show up every 30mins?
Srikanth Chekuri
01/03/2023, 3:28 PMBill Cavalieri
01/03/2023, 3:30 PM2023-01-03T15:05:59.355Z error prometheusexporter@v0.66.0/log.go:34 error encoding and sending metric family: write tcp 172.27.0.4:8889->172.27.0.7:50926: write: broken pipe
2023-01-03T15:05:59.355Z error prometheusexporter@v0.66.0/log.go:34 error encoding and sending metric family: write tcp 172.27.0.4:8889->172.27.0.7:50926: write: broken pipe
2023-01-03T15:05:59.355Z error prometheusexporter@v0.66.0/log.go:34 error encoding and sending metric family: write tcp 172.27.0.4:8889->172.27.0.7:50926: write: broken pipe
2023-01-03T15:07:59.583Z error prometheusexporter@v0.66.0/log.go:34 error encoding and sending metric family: write tcp 172.27.0.4:8889->172.27.0.7:36196: write: broken pipe
Bill Cavalieri
01/03/2023, 3:31 PMSrikanth Chekuri
01/03/2023, 3:34 PMBill Cavalieri
01/03/2023, 3:41 PMSrikanth Chekuri
01/03/2023, 3:51 PMSrikanth Chekuri
01/03/2023, 3:57 PMclickhouse-setup_otel-collector-metrics_1
? Do you see any scrape timeout errors?Bill Cavalieri
01/03/2023, 4:00 PM2022-12-30T16:43:09.311Z info prometheusreceiver@v0.66.0/metrics_receiver.go:288 Starting scrape manager {"kind": "receiver", "name": "prometheus", "pipeline": "metrics"}
time="2022-12-30T16:54:13Z" level=error msg="read tcp 172.27.0.7:42490->172.27.0.2:9000: read: connection reset by peer" component=clickhouse
time="2022-12-30T16:54:19Z" level=error msg="dial tcp 172.27.0.2:9000: i/o timeout" component=clickhouse
time="2022-12-30T16:54:23Z" level=error msg="dial tcp 172.27.0.2:9000: connect: connection refused" component=clickhouse
time="2022-12-30T16:54:28Z" level=error msg="dial tcp 172.27.0.2:9000: connect: connection refused" component=clickhouse
2022-12-30T16:55:54.582Z info service/collector.go:219 Received signal from OS {"signal": "terminated"}
Bill Cavalieri
01/03/2023, 4:01 PMnode-exporter:
image: prom/node-exporter
environment:
GOMAXPROCS: '1'
Srikanth Chekuri
01/03/2023, 4:05 PMclickhouse-setup_otel-collector-metrics_1
?Bill Cavalieri
01/03/2023, 4:10 PMSrikanth Chekuri
01/03/2023, 4:13 PMSrikanth Chekuri
01/03/2023, 4:14 PMclickhouse-setup_otel-collector-metrics_1
has scrape failues.Bill Cavalieri
01/03/2023, 4:16 PMCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0964498e706c signoz/frontend:0.13.0 "nginx -g 'daemon of…" 5 days ago Up 3 days 80/tcp, 0.0.0.0:3301->3301/tcp, :::3301->3301/tcp frontend
29743a5780f5 signoz/signoz-otel-collector:0.66.1 "/signoz-collector -…" 5 days ago Up 41 minutes 0.0.0.0:4317-4318->4317-4318/tcp, :::4317-4318->4317-4318/tcp clickhouse-setup_otel-collector_1
2d0db03a384e signoz/signoz-otel-collector:0.66.1 "/signoz-collector -…" 5 days ago Up 3 days 4317-4318/tcp clickhouse-setup_otel-collector-metrics_1
4be9e1f5908b signoz/query-service:0.13.0 "./query-service -co…" 5 days ago Up 3 days (healthy) 8080/tcp query-service
d207004996ec clickhouse/clickhouse-server:22.8.8-alpine "/entrypoint.sh" 3 weeks ago Up 3 days (healthy) 0.0.0.0:8123->8123/tcp, :::8123->8123/tcp, 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp, 0.0.0.0:9181->9181/tcp, :::9181->9181/tcp, 9009/tcp clickhouse
725e4dea35b1 bitnami/zookeeper:3.7.0 "/opt/bitnami/script…" 3 weeks ago Up 3 days 0.0.0.0:2181->2181/tcp, :::2181->2181/tcp, 0.0.0.0:2888->2888/tcp, :::2888->2888/tcp, 0.0.0.0:3888->3888/tcp, :::3888->3888/tcp, 8080/tcp zookeeper-1
5d41c306f5e7 signoz/alertmanager:0.23.0-0.2 "/bin/alertmanager -…" 4 weeks ago Up 4 weeks 9093/tcp
docker stats
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
0964498e706c frontend 0.00% 2.332MiB / 9.72GiB 0.02% 2.43MB / 3.43MB 126MB / 8.9MB 5
29743a5780f5 clickhouse-setup_otel-collector_1 216.59% 2.167GiB / 9.72GiB 22.29% 3.16GB / 6.6GB 294MB / 131kB 11
2d0db03a384e clickhouse-setup_otel-collector-metrics_1 55.05% 1.219GiB / 9.72GiB 12.54% 389GB / 46.9GB 38.8GB / 27.7GB 12
4be9e1f5908b query-service 0.05% 198.9MiB / 9.72GiB 2.00% 490MB / 240MB 15.6GB / 3.79GB 12
d207004996ec clickhouse 57.66% 1.39GiB / 9.72GiB 14.30% 638GB / 762GB 1.09TB / 2.92TB 301
725e4dea35b1 zookeeper-1 0.25% 45.2MiB / 9.72GiB 0.45% 5.97MB / 5.59MB 7.76GB / 1.58GB 47
5d41c306f5e7 clickhouse-setup_alertmanager_1 0.08% 11.62MiB / 9.72GiB 0.12% 24.1MB / 23.8MB 9.14GB / 368MB 11
Srikanth Chekuri
01/03/2023, 4:17 PMBill Cavalieri
01/03/2023, 4:18 PMSrikanth Chekuri
01/03/2023, 4:19 PMBill Cavalieri
01/03/2023, 4:20 PMBill Cavalieri
01/03/2023, 4:24 PMSrikanth Chekuri
01/03/2023, 4:25 PMBill Cavalieri
01/03/2023, 4:37 PMSrikanth Chekuri
01/03/2023, 4:55 PMscrape_interval
to 90s and set scrape_timeout
to something like 50s and see how it goes?Bill Cavalieri
01/03/2023, 5:01 PMAnkit Nayan
ExitCode
of container might be helpful. Code 137
is OOMKilled.
docker inspect clickhouse-setup_otel-collector_1 --format='{{.State.ExitCode}}'
Ankit Nayan
docker inspect clickhouse-setup_otel-collector-metrics_1 --format='{{.State.ExitCode}}'
Bill Cavalieri
01/04/2023, 6:30 PMSrikanth Chekuri
01/04/2023, 6:33 PMscrape_interval
, it’s not there today and uses the default valueBill Cavalieri
01/04/2023, 6:36 PMBill Cavalieri
01/04/2023, 9:23 PMclickhouse-setup_otel-collector_1
logs from fresh startup to problem. No errors logged now that I've increased scrape_interval: 90s
and scrape_timeout: 50s
Exit code from both containers 0. The below docker stats is while having the issue, I haven't restarted clickhouse-setup_otel-collector_1 yet. I'm avg 7.5 million span counts/hour.
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
0964498e706c frontend 0.00% 7.219MiB / 15.63GiB 0.05% 1.64MB / 1.8MB 11.4MB / 16.4kB 7
29743a5780f5 clickhouse-setup_otel-collector_1 383.61% 3.799GiB / 15.63GiB 24.31% 12.5GB / 24.2GB 23.7MB / 0B 13
2d0db03a384e clickhouse-setup_otel-collector-metrics_1 0.00% 2.016GiB / 15.63GiB 12.90% 124GB / 16.7GB 47MB / 33.4MB 13
4be9e1f5908b query-service 0.00% 199.7MiB / 15.63GiB 1.25% 200MB / 79.9MB 73.9MB / 5.5MB 14
d207004996ec clickhouse 110.28% 1.636GiB / 15.63GiB 10.47% 265GB / 239GB 370GB / 1.16TB 327
725e4dea35b1 zookeeper-1 0.14% 174.4MiB / 15.63GiB 1.09% 2.78MB / 5.39MB 144MB / 231MB 54
5d41c306f5e7 clickhouse-setup_alertmanager_1 0.17% 15.73MiB / 15.63GiB 0.10% 65.1kB / 1.19kB 19.2MB / 0B 13