please has anyone exposed their otel-collector thr...
# support
a
please has anyone exposed their otel-collector through nginx ingress before? i need help, here is my current settings
Copy code
apiVersion: <http://networking.k8s.io/v1|networking.k8s.io/v1>
kind: Ingress
metadata:
  name: signoz-otel-collector-grpc-ingress
  namespace: ops
  annotations:
    <http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: nginx
    <http://nginx.ingress.kubernetes.io/grpc-backend|nginx.ingress.kubernetes.io/grpc-backend>: "true"
    <http://nginx.ingress.kubernetes.io/backend-protocol|nginx.ingress.kubernetes.io/backend-protocol>: "GRPCS"
    <http://nginx.ingress.kubernetes.io/proxy-buffer-size|nginx.ingress.kubernetes.io/proxy-buffer-size>: "128k"
    <http://nginx.ingress.kubernetes.io/ssl-redirect|nginx.ingress.kubernetes.io/ssl-redirect>: "true"
    <http://nginx.ingress.kubernetes.io/proxy-body-size|nginx.ingress.kubernetes.io/proxy-body-size>: "0"
    <http://nginx.ingress.kubernetes.io/proxy-connect-timeout|nginx.ingress.kubernetes.io/proxy-connect-timeout>: "300"
    <http://nginx.ingress.kubernetes.io/proxy-read-timeout|nginx.ingress.kubernetes.io/proxy-read-timeout>: "300"
    <http://nginx.ingress.kubernetes.io/proxy-send-timeout|nginx.ingress.kubernetes.io/proxy-send-timeout>: "300"
    <http://nginx.ingress.kubernetes.io/upstream-keepalive-timeout|nginx.ingress.kubernetes.io/upstream-keepalive-timeout>: "600"
    <http://nginx.ingress.kubernetes.io/upstream-keepalive-requests|nginx.ingress.kubernetes.io/upstream-keepalive-requests>: "100"
spec:
  rules:
  - host: <http://otelcollector.domain.com|otelcollector.domain.com>
    http:
      paths:
      - path: /
        pathType: ImplementationSpecific
        backend:
          service:
            name: signoz-otel-collector
            port:
              number: 4317
---
apiVersion: <http://networking.k8s.io/v1|networking.k8s.io/v1>
kind: Ingress
metadata:
  name: signoz-otel-collector-http-ingress
  namespace: ops
  annotations:
    ingressClassName: nginx
    <http://nginx.ingress.kubernetes.io/backend-protocol|nginx.ingress.kubernetes.io/backend-protocol>: HTTP
    <http://nginx.ingress.kubernetes.io/proxy-connect-timeout|nginx.ingress.kubernetes.io/proxy-connect-timeout>: "300"
    <http://nginx.ingress.kubernetes.io/proxy-read-timeout|nginx.ingress.kubernetes.io/proxy-read-timeout>: "300"
    <http://nginx.ingress.kubernetes.io/proxy-send-timeout|nginx.ingress.kubernetes.io/proxy-send-timeout>: "300"
spec:
  rules:
  - host: <http://otelcollector-http.domain.com|otelcollector-http.domain.com>
    http:
      paths:
      - path: /
        pathType: ImplementationSpecific
        backend:
          service:
            name: signoz-otel-collector
            port:
              number: 4318
n
@Prashant Shahi
p
@Abdulmalik Salawu Is one of them working for you?
Copy code
<http://nginx.ingress.kubernetes.io/backend-protocol|nginx.ingress.kubernetes.io/backend-protocol>: "GRPCS"
Can you replace this one with
GRPC
?
a
Thank you, I have exposed borg this way, but when i add the otel endpoint in the otel-collector it, I am getting error connecting logs
see how i use it in the other cluster
Copy code
global:
  clusterName: "preprod"
  deploymentEnvironment: "preprod"
  cloud: aws
enabled: true
otelCollectorEndpoint: <http://otelcollector.domain.com:4317|otelcollector.domain.com:4317>
otelInsecure: false
namespace: "ops"
presets:
  loggingExporter:
    enabled: false
    verbosity: basic
    samplingInitial: 2
    samplingThereafter: 500
  otlpExporter:
    enabled: true
  logsCollection:
    enabled: true
    startAt: beginning
    includeFilePath: true
    includeFileName: false
    include:
      - /var/log/pods/*/*/*.log
    blacklist:
      enabled: true
      signozLogs: false
      namespaces:
        - kube-system
    whitelist:
      enabled: false
      signozLogs: true
      namespaces: []
      pods: []
      containers: []
      additionalInclude: []
    operators:
      - id: container-parser
        type: container

otelAgent:
  enabled: true
  name: "otel-agent"
  annotations:
    <http://karpenter.sh/do-not-disrupt|karpenter.sh/do-not-disrupt>: "true"
  config:
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
            include_metadata: true
            max_recv_msg_size_mib: 16
          http:
            cors:
              allowed_origins:
                - '*'
            endpoint: 0.0.0.0:4318
            include_metadata: true
    processors:
      batch:
        send_batch_max_size: 10000
        send_batch_size: 10000
        timeout: 200ms

otelDeployment:
  enabled: true
  name: "otel-deployment"
  annotations:
    <http://karpenter.sh/do-not-disrupt|karpenter.sh/do-not-disrupt>: "true"
  config:
    receivers: {}
    processors:
      batch:
        send_batch_size: 10000
        timeout: 2s
Copy code
{"level":"warn","ts":1728493111.8025117,"caller":"zapgrpc/zapgrpc.go:195","msg":"[core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: \"<http://otelcollector.domain.com:4317\|otelcollector.domain.com:4317\>", ServerName: \"<http://otelcollector.domain.com:4317\|otelcollector.domain.com:4317\>", }. Err: connection error: desc = \"transport: Error while dialing: dial tcp 54.201.54.38:4317: i/o timeout\"","grpc_log":true}
{"level":"info","ts":1728493158.8286445,"caller":"exporterhelper/retry_sender.go:177","msg":"Exporting failed. Will retry the request after interval.","kind":"exporter","data_type":"metrics","name":"otlp","error":"rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 44.228.38.211:4317: i/o timeout\"","interval":"8.782585704s"}
{"level":"error","ts":1728493890.1647754,"caller":"exporterhelper/queue_sender.go:93","msg":"Exporting failed. No more retries left. Dropping data.","kind":"exporter","data_type":"metrics","name":"otlp","error":"max elapsed time expired rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 54.201.54.38:4317: i/o timeout\"","dropped_items":613,"stacktrace":"<http://go.opentelemetry.io/collector/exporter/exporterhelper.(*queueSender).onTemporaryFailure|go.opentelemetry.io/collector/exporter/exporterhelper.(*queueSender).onTemporaryFailure>\n\tgo.opentelemetry.io/collector/exporter@v0.88.0/exporterhelper/queue_sender.go:93\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send\n\tgo.opentelemetry.io/collector/exporter@v0.88.0/exporterhelper/retry_sender.go:161\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send\n\tgo.opentelemetry.io/collector/exporter@v0.88.0/exporterhelper/metrics.go:176\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*queueSender).start.func1\n\tgo.opentelemetry.io/collector/exporter@v0.88.0/exporterhelper/queue_sender.go:126\ngo.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).Start.func1\n\tgo.opentelemetry.io/collector/exporter@v0.88.0/exporterhelper/internal/bounded_memory_queue.go:52"}
something i noticed again is this
Copy code
{"level":"info","ts":1728493869.0500045,"caller":"service@v0.109.0/service.go:239","msg":"Everything is ready. Begin running and processing data."}{"level":"info","ts":1728493869.0500863,"caller":"localhostgate/featuregate.go:63","msg":"The default endpoints for all servers in components have changed to use localhost instead of 0.0.0.0. Disable the feature gate to temporarily revert to the previous default.","feature gate ID":"component.UseLocalHostAsDefaultHost"}{"level":"info","ts":1728493887.7916918,"caller":"exporterhelper/retry_sender.go:118","msg":"Exporting failed. Will retry the request after interval.","kind":"exporter","data_type":"logs","name":"otlp","error":"rpc error: code = DeadlineExceeded desc = received context error while waiting for new LB policy update: context deadline exceeded","interval":"9.185975162s"}
{"level":"warn","ts":1728493889.4500751,"caller":"grpc@v1.66.0/clientconn.go:1379","msg":"[core] [Channel #2 SubChannel #6]grpc: addrConn.createTransport failed to connect to {Addr: \"54.201.54.38:4317\", ServerName: \"<http://otelcollector.domain.com:4317\|otelcollector.domain.com:4317\>", }. Err: connection error: desc = \"transport: Error while dialing: dial tcp 54.201.54.38:4317: i/o timeout\"","grpc_log":true}
{"level":"warn","ts":1728493889.4512422,"caller":"grpc@v1.66.0/clientconn.go:1379","msg":"[core] [Channel #2 SubChannel #6]grpc: addrConn.createTransport failed to connect to {Addr: \"54.245.245.198:4317\", ServerName: \"<http://otelcollector.domain.com:4317\|otelcollector.domain.com:4317\>", }. Err: connection error: desc = \"transport: Error while dialing: dial tcp 54.245.245.198:4317: i/o timeout\"","grpc_log":true}
{"level":"warn","ts":1728493889.4513032,"caller":"grpc@v1.66.0/clientconn.go:1379","msg":"[core] [Channel #2 SubChannel #6]grpc: addrConn.createTransport failed to connect to {Addr: \"44.228.38.211:4317\", ServerName: \"<http://otelcollector.domain.com:4317\|otelcollector.domain.com:4317\>", }. Err: connection error: desc = \"transport: Error while dialing: dial tcp 44.228.38.211:4317: i/o timeout\"","grpc_log":true}
{"level":"info","ts":1728493889.4522643,"caller":"exporterhelper/retry_sender.go:118","msg":"Exporting failed. Will retry the request after interval.","kind":"exporter","data_type":"logs","name":"otlp","error":"rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 54.201.54.38:4317: i/o timeout\"","interval":"14.640061806s"}
i am able to connect to the ip in the same cluster outside otel
Copy code
nc -zv 54.201.54.38 4317                             
Connection to 54.201.54.38 port 4317 [tcp/*] succeeded!
p
IP? Aren't you using custom domain with nginx ingress controller?
a
Apologies i was still testing that period, that succeeded connection was outside the cluster
Now i am having a situation where my pods can’t connect with a loadbalancer that has port 4317, 4318 but its able to connect with port 443, and 80 while locally on my laptop i am able to connect to that loadbalancer both on 443,80,4318,4317, my eks nodes were deployed to a private subnet, i have tried to add the node security group egress rules to allow connection to port 4318,4317 but I still can not connect to that loadbalancer on port 4317,4318
p
That is correct behaviour. It's okay to use
443
port - provided that data is being sent to the desired port and services.
a
thank you no connection errors again but permission errors now, seems someone also opened this issue https://github.com/SigNoz/charts/issues/420
Copy code
{
  "level": "error",
  "ts": 1728580648.2491407,
  "caller": "scraperhelper/scrapercontroller.go:197",
  "msg": "Error scraping metrics",
  "kind": "receiver",
  "name": "hostmetrics",
  "data_type": "metrics",
  "error": "failed to read usage at /hostfs/var/lib/containerd/tmpmounts/containerd-mount296883152: permission denied",
  "scraper": "hostmetrics",
  "stacktrace": "<http://go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport|go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport>\n\tgo.opentelemetry.io/collector/receiver@v0.109.0/scraperhelper/scrapercontroller.go:197\ngo.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1\n\tgo.opentelemetry.io/collector/receiver@v0.109.0/scraperhelper/scrapercontroller.go:173"
}