This message was deleted.
# support
s
This message was deleted.
s
Where is this issue coming from?
z
in otel-collector logs
s
This is generic grpc error which needs more context.
z
its a basic setup, we are instrumenting java services and sending metrics to signoz
we are doing load testing, as we started load testing, signoz started throwing this error
s
This usually happens to prevent large numbers of idle clients from consuming too many resources.
z
2022-04-01T102637.599Z error zapgrpc/zapgrpc.go:208 [transport] transport: Got too many pings from the client, closing the connection. {"grpc_log": true} go.uber.org/zap/zapgrpc.(*Logger).Errorln /home/codegeas/go/pkg/mod/go.uber.org/zap@v1.20.0/zapgrpc/zapgrpc.go:208 google.golang.org/grpc/internal/grpclog.ErrorDepth /home/codegeas/go/pkg/mod/google.golang.org/grpc@v1.44.0/internal/grpclog/grpclog.go:55 google.golang.org/grpc/grpclog.(*componentData).ErrorDepth /home/codegeas/go/pkg/mod/google.golang.org/grpc@v1.44.0/grpclog/component.go:46 google.golang.org/grpc/grpclog.(*componentData).Errorf /home/codegeas/go/pkg/mod/google.golang.org/grpc@v1.44.0/grpclog/component.go:79 google.golang.org/grpc/internal/transport.(*http2Server).handlePing /home/codegeas/go/pkg/mod/google.golang.org/grpc@v1.44.0/internal/transport/http2_server.go:885 google.golang.org/grpc/internal/transport.(*http2Server).HandleStreams /home/codegeas/go/pkg/mod/google.golang.org/grpc@v1.44.0/internal/transport/http2_server.go:650 g
is there a way to prevent this
s
You should probably configure the keepalive based on your (client) requirements.
z
@User where exactly can we configure the keepalive in signoz ?
s
Fixing the root cause would be to look at why you have lot of idle client connections which are the reason for this log.
z
the setup is straightforward, there are few services running on some machines lets say server(1-5) all connecting to signoz and sending data to signoz, signoz is running on a separate server, we are using default configurations, i am not sure why we would face this issue with default setup, as we are doing a standard setup
s
My best guess is your load testing setup is leaving behind lot of unused idle connections. Did you face the issue when you access the application as a regular user?
a
@User we load-tested signoz on 4CPU machine and we were able to ingest around 1-2K rps (50K spans/s) without a problem
how much load are you generating?
z
its less than that, but we are repeatedly getting this error in the collector and instrumentation stops working
how do we set GRPC_ARG_HTTP2_MAX_PING_STRIKES this value in signoz, we are looking to set it to 0
s
SigNoz on its own doesn't set any such grpc settings.
z
signoz is running on a bigger machine with enough memory, cpu, disk space, but right in the middle of the test we stop receiving metrics and get the grpc too many pings error, its a regular distributed setup, not sure what could be causing this, any inputs will be helpful
a
@User this seems like a weird issue. Nobody seems to have reported this in opentelemetry till now and seems like a
grpc
keep_alive issue which should be resolved using
GRPC_ARG_HTTP2_MAX_PING_STRIKES
but I couldn't find a way to enable such config. Which java-agent version are you using? might be a good idea to downgrade it to 1 or 2 versions older than latest to avoid recent issues.
@User I don't think this is an issue with their application. Otel java agent connects with otel-collector at SigNoz so this seems to be an issue due to otel-java-agent grpc exporter and otel-collector grpc receiver
@User I could see a set of params to play around in otlp receiver at otel-collector https://github.com/open-telemetry/opentelemetry-collector/blob/v0.43.0/receiver/otlpreceiver/testdata/config.yaml#L25-L38 You can change the above
keep_alive
setting at otel-collector at signoz at https://github.com/SigNoz/signoz/blob/develop/deploy/docker/clickhouse-setup/otel-collector-config.yaml#L6-L9 and restart otel-collector service
z
Thank you @User i will try this out and see if this works, seems like something which may solve our problem, will test and update you
👍 1