Where can I find the query behind /services? I'd ...
# support
b
Where can I find the query behind /services? I'd like to make a cronjob that will restart the clickhouse-setup_otel-collector_1 container when the results are 0
s
I’d like to make a cronjob that will restart the clickhouse-setup_otel-collector_1 container when the results are 0
Sounds like you are trying to work around another issue. Can you share why you want to restart the collector and what’s leading to results 0?
b
Correct, I'm on version 0.14.0. Every hour or so my Service/Traces stop logging. While this is happening, the memory usage in the clickhouse-setup_otel-collector_1 container increases until it restarts. This takes about an hour, so I go every other hour basically with stats.
Copy code
CONTAINER ID   NAME                                        CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
f16d7b1538ca   frontend                                    0.00%     2.621MiB / 15.63GiB   0.02%     8.07MB / 15.1MB   72.9MB / 14.7MB   7
dc7aec357218   clickhouse-setup_otel-collector_1           265.47%   8.823GiB / 15.63GiB   56.47%    49.5GB / 8.83GB   293MB / 3.92MB    14
e57232ccc194   clickhouse-setup_otel-collector-metrics_1   122.27%   1.567GiB / 15.63GiB   10.03%    2.13TB / 88.4GB   48.9GB / 38.3GB   14
e190ef81da7d   query-service                               0.00%     108.4MiB / 15.63GiB   0.68%     869MB / 189MB     15.2GB / 2.51GB   13
e24885bb349f   clickhouse                                  167.03%   1.788GiB / 15.63GiB   11.44%    1.01TB / 6.29TB   2.11TB / 4.88TB   380
725e4dea35b1   zookeeper-1                                 0.23%     50.32MiB / 15.63GiB   0.31%     24.2MB / 30.7MB   32GB / 5.14GB     55
5d41c306f5e7   clickhouse-setup_alertmanager_1             0.10%     12.36MiB / 15.63GiB   0.08%     295kB / 1.63kB    30.2GB / 392MB    13
s
Can you share your collector config?
b
@Srikanth Chekuri here you go
s
Can you use this updating config and let us know how it goes?
b
Thanks, will let you know
Services stopped working even faster than normal
a
I found my collectors restarting often and its because i was not using memory_limiter, for PROD i would highly advise to use it on every pipeline. Docs here: https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/memorylimiterprocessor/README.md
since adding it it has been fairly stable (0 OOM crashes) and garbage collection ran more robustly
s
@Bill Cavalieri based on what you shared earlier I suspected the memory is slowly building up for longer time based on the max_batch_send_size being set to 11k which I thought could be less for you. While the memorylimiter helps with no OOMs by dropping the data you may still want to understand if the ingestion is high that one collector can’t handle it (in that case you may want to scale up) or is it something else. I would be happy to debug this on call when the issue occurs.
a
@Alexei Zenin and @Bill Cavalieri possible to schedule a call with @Srikanth Chekuri to drill down on this? If we get to the root cause, we will fix it asap. I guess other users must be facing the same too
b
Yes I have high tracing volume during the day, at night the process will stay running without issue. I'm pretty free today, so can debug whenever @Srikanth Chekuri is available. It's currently working, but should fail inside the next hour
s
I have a call in some time for ~30 mins. I will be available after that. I will let you know let’s get on a call after that.
b
Getting this error:
Copy code
2023-01-19T17:21:42.360Z	error	prometheusexporter@v0.66.0/log.go:34	error encoding and sending metric family: write tcp 172.27.0.8:8889->172.27.0.5:34328: write: broken pipe
	{"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
<http://github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*promLogger).Println|github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*promLogger).Println>
	/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.66.0/log.go:34
<http://github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1.2|github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1.2>
	/go/pkg/mod/github.com/prometheus/client_golang@v1.14.0/prometheus/promhttp/http.go:187
<http://github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1|github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1>
	/go/pkg/mod/github.com/prometheus/client_golang@v1.14.0/prometheus/promhttp/http.go:205
net/http.HandlerFunc.ServeHTTP
	/usr/local/go/src/net/http/server.go:2084
net/http.(*ServeMux).ServeHTTP
	/usr/local/go/src/net/http/server.go:2462
<http://go.opentelemetry.io/collector/config/confighttp.(*decompressor).wrap.func1|go.opentelemetry.io/collector/config/confighttp.(*decompressor).wrap.func1>
	/go/pkg/mod/go.opentelemetry.io/collector@v0.66.0/config/confighttp/compression.go:162
net/http.HandlerFunc.ServeHTTP
	/usr/local/go/src/net/http/server.go:2084
<http://go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*Handler).ServeHTTP|go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*Handler).ServeHTTP>
	/go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.36.4/handler.go:204
<http://go.opentelemetry.io/collector/config/confighttp.(*clientInfoHandler).ServeHTTP|go.opentelemetry.io/collector/config/confighttp.(*clientInfoHandler).ServeHTTP>
	/go/pkg/mod/go.opentelemetry.io/collector@v0.66.0/config/confighttp/clientinfohandler.go:39
net/http.serverHandler.ServeHTTP
	/usr/local/go/src/net/http/server.go:2916
net/http.(*conn).serve
	/usr/local/go/src/net/http/server.go:1966
s
Can we get on call now?
b
Yes I'm available
s
@Bill Cavalieri there are a couple of things I would like to check for debugging this further. Let me know if we can do the call.
b
yes I'm available
s
Give me few mins, I will send the huddle invite