self-hosted signoz not able to send log to HTTP en...
# support
a
self-hosted signoz not able to send log to HTTP endpoint of collector: can someone help me correct config at agent and collector end for using HTTP instead of gRPC ? versions :
opentelemetry-collector-contrib:0.109.0
signoz-otel-collector:0.111.22
i am trying to shift collector from gRCP to HTTP as gRPC is causing load balancing issue if multiple collector pods are running(due to persistence connection used by gRPC) but Not able to send data to collector on HTTP port 4318 getting below error on agent logs :
Copy code
{
  "level": "warn",
  "ts": 1741932754.2960992,
  "caller": "grpc@v1.66.0/clientconn.go:1379",
  "msg": "[core] [Channel #1 SubChannel #8]grpc: addrConn.createTransport failed to connect to {Addr: \"172.20.236.218:4318\", ServerName: \"signoz-infra-otel-collector:4318\", }. Err: connection error: desc = \"error reading server preface: http2: frame too large\"",
  "grpc_log": true
}
looks like gRCP is in use, though i changed agent endpoint to HTTP :
Copy code
exporters:
      otlp:
        endpoint: <http://signoz-infra-otel-collector:4318>
        headers:
          signoz-access-token: ${env:SIGNOZ_API_KEY}
          "Content-Type": "application/x-protobuf"
        compression: "gzip"  # Optional: Use compression if needed
v
You need to increase the
max_recv_msg_size_mib
setting on the receiver: https://github.com/open-telemetry/opentelemetry-collector/blob/main/config/configgrpc/README.md
a
@Vibhu Pandey i have increased max_recv_msg_size_mib from 16 to 32, will update on the result.
@Vibhu Pandey this did not help, traffic was still uneven widely, i switched to single instance but pod has been crashing due to OOM error when log volume goes beyond 7mil per min. collector has 14 core and 28GB memory. can you suggest optimal resource needed to support 15mil per min ingestion ? will kafka help as buffer ?
monitoring graph
@Vibhu Pandey also i am observing pod restart due to OOM error whenever log ingestion increase > 200k/min, is there any solution to prevent this ? can we introduce any buffering layer like kafka?
v
Yes you can use a buffering layer!
a
@Vibhu Pandey can you provide documentation link if you have it ?
@Vibhu Pandey i was able to control spike traffic by tuning OTEL Agent and increasing below, so far Pod restart have stopped and i was able to decrease pod resources as well.
telAgent:
config: batch: send_batch_size: 100000 send_batch_max_size : 110000 timeout: 5s