Hi SigNoz community! I am running on a k3s Kuberne...
# support
a
Hi SigNoz community! I am running on a k3s Kubernetes cluster, with SigNoz deployed via the SigNoz Helm chart. I can get logs flowing in OK with the
filelog
reciever. However I see no metrics coming in. I am pretty sure that I have configured the
hostmetrics
reciever fine as a Daemonset. Any insights on what I missed would be greatly appreciated. Details in this thread.
daemonset defintion:
Copy code
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-collector-metrics
spec:
  selector:
    matchLabels:
      name: otel-collector-metrics
  template:
    metadata:
      labels:
        name: otel-collector-metrics
    spec:
      containers:
        - name: otel-collector
          image: otel/opentelemetry-collector-contrib:0.111.0
          securityContext:
            privileged: true
          volumeMounts:
            - name: host-fs
              mountPath: /hostfs
              readOnly: true
            - name: otel-config-vol
              mountPath: /etc/otel-collector-config
          command:
            - "/otelcol-contrib"
            - "--config=/etc/otel-collector-config/otel-collector-config-metrics.yaml"
      volumes:
        - name: host-fs
          hostPath:
            path: /
            type: Directory
        - name: otel-config-vol
          configMap:
            name: otel-collector-config-metrics
configmap definition:
Copy code
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-config-metrics
data:
  otel-collector-config-metrics.yaml: |
    receivers:
      hostmetrics:
        root_path: "/hostfs"
        collection_interval: 60s
        scrapers:
          cpu:
          memory:
          disk:
          filesystem:
          load:
          network:
          processes:
    
    processors:
      resource:
        attributes:
          - key: cluster_name
            value: "hera"
            action: insert
          - key: node_name
            from_attribute: KUBERNETES_NODE_NAME
            action: insert    
      batch:
        timeout: 10s
    
    exporters:
      otlp:
        endpoint: "hera-signoz-otel-collector:4317"
        tls:
          insecure: true
    
    service:
      telemetry:
        logs:
          level: "debug"
      pipelines:
        metrics:
          receivers: [hostmetrics]
          processors: [batch]
          exporters: [otlp]
dashboard with
no data
message:
s
Do you see any error logs in the agent collector?
a
Hi @Srikanth Chekuri thank you for your input. yes, I do see some errors in the agent collector logs.
info    internal/retry_sender.go:118    Exporting failed. Will retry the request after interval.        {"kind": "exporter", "data_type": "metrics", "name": "otlp", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "7.092785582s"}
but after a while the connection seems fine as I see this log entry:
Connectivity change to READY
for reference these are all the logs after a pod restart:
Copy code
2024-10-27T09:57:17.717Z        info    service@v0.111.0/service.go:136 Setting up own telemetry...
2024-10-27T09:57:17.717Z        info    telemetry/metrics.go:70 Serving metrics {"address": "localhost:8888", "metrics level": "Normal"}
2024-10-27T09:57:17.717Z        debug   builders/builders.go:24 Stable component.       {"kind": "exporter", "data_type": "metrics", "name": "otlp"}
2024-10-27T09:57:17.718Z        debug   builders/builders.go:24 Beta component. May change in the future.       {"kind": "processor", "name": "batch", "pipeline": "metrics"}
2024-10-27T09:57:17.718Z        debug   builders/builders.go:24 Beta component. May change in the future.       {"kind": "receiver", "name": "hostmetrics", "data_type": "metrics"}
2024-10-27T09:57:17.720Z        info    service@v0.111.0/service.go:208 Starting otelcol-contrib...     {"Version": "0.111.0", "NumCPU": 8}
2024-10-27T09:57:17.720Z        info    extensions/extensions.go:39     Starting extensions...
2024-10-27T09:57:17.720Z        info    grpc@v1.67.1/clientconn.go:162  [core] original dial target is: "hera-signoz-otel-collector:4317" {"grpc_log": true}
2024-10-27T09:57:17.720Z        info    grpc@v1.67.1/clientconn.go:440  [core] [Channel #1]Channel created      {"grpc_log": true}
2024-10-27T09:57:17.720Z        info    grpc@v1.67.1/clientconn.go:193  [core] [Channel #1]parsed dial target is: resolver.Target{URL:url.U
RL{Scheme:"dns", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/hera-signoz-otel-collector:4317", RawPath:"", OmitHost:false, ForceQ
uery:false, RawQuery:"", Fragment:"", RawFragment:""}}  {"grpc_log": true}
2024-10-27T09:57:17.720Z        info    grpc@v1.67.1/clientconn.go:194  [core] [Channel #1]Channel authority set to "hera-signoz-otel-collector:4317"      {"grpc_log": true}
2024-10-27T09:57:17.720Z        info    service@v0.111.0/service.go:234 Everything is ready. Begin running and processing data.
2024-10-27T09:57:27.721Z        info    grpc@v1.67.1/clientconn.go:345  [core] [Channel #1]Channel exiting idle mode    {"grpc_log": true}
2024-10-27T09:57:32.722Z        info    internal/retry_sender.go:118    Exporting failed. Will retry the request after interval.        {"k
ind": "exporter", "data_type": "metrics", "name": "otlp", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "
interval": "7.092785582s"}
2024-10-27T09:57:35.732Z        info    dns/dns_resolver.go:246 [dns] dns: SRV record lookup error: lookup _grpclb._tcp.hera-signoz-otel-collector on 10.43.0.10:53: server misbehaving    {"grpc_log": true}
2024-10-27T09:57:43.742Z        info    grpc@v1.67.1/resolver_wrapper.go:200    [core] [Channel #1]Resolver state updated: {
  "Addresses": [
    {
      "Addr": "10.43.17.7:4317",
      "ServerName": "",
      "Attributes": null,
      "BalancerAttributes": null,
      "Metadata": null
    }
  ],
  "Endpoints": [
    {
      "Addresses": [
        {
          "Addr": "10.43.17.7:4317",
          "ServerName": "",
          "Attributes": null,
          "BalancerAttributes": null,
          "Metadata": null
        }
      ],
      "Attributes": null
    }
  ],
  "ServiceConfig": null,
  "Attributes": null
} (resolver returned new addresses)     {"grpc_log": true}
2024-10-27T09:57:43.742Z        info    grpc@v1.67.1/balancer_wrapper.go:107    [core] [Channel #1]Channel switches to new LB policy "pick_
first"  {"grpc_log": true}
2024-10-27T09:57:43.743Z        info    gracefulswitch/gracefulswitch.go:193    [pick-first-lb] [pick-first-lb 0xc002ebea50] Received new c
onfig {
  "shuffleAddressList": false
}, resolver state {
  "Addresses": [
    {
      "Addr": "10.43.17.7:4317",                                                                                                           
      "ServerName": "",                                                                                                                    
      "Attributes": null,                                                                                                                  
      "BalancerAttributes": null,                                                                                                          
      "Metadata": null                                                                                                                     
    }                                                                                                                                      
  ],                                                                                                                                       
  "Endpoints": [                                                                                                                           
    {                                                                                                                                      
      "Addresses": [                                                                                                                       
        {                                                                                                                                  
          "Addr": "10.43.17.7:4317",                                                                                                       
          "ServerName": "",                                                                                                                
          "Attributes": null,
          "BalancerAttributes": null,
          "Metadata": null
        }
      ],
      "Attributes": null
    }
  ],
  "ServiceConfig": null,
  "Attributes": null
}       {"grpc_log": true}
2024-10-27T09:57:43.743Z        info    grpc@v1.67.1/balancer_wrapper.go:180    [core] [Channel #1 SubChannel #2]Subchannel created     {"g
rpc_log": true}
2024-10-27T09:57:43.743Z        info    grpc@v1.67.1/clientconn.go:544  [core] [Channel #1]Channel Connectivity change to CONNECTING    {"g
rpc_log": true}
2024-10-27T09:57:43.743Z        info    grpc@v1.67.1/clientconn.go:1199 [core] [Channel #1 SubChannel #2]Subchannel Connectivity change to
CONNECTING      {"grpc_log": true}
2024-10-27T09:57:43.743Z        info    grpc@v1.67.1/clientconn.go:1317 [core] [Channel #1 SubChannel #2]Subchannel picks a new address "10
.43.17.7:4317" to connect       {"grpc_log": true}
2024-10-27T09:57:43.744Z        info    pickfirst/pickfirst.go:176      [pick-first-lb] [pick-first-lb 0xc002ebea50] Received SubConn state
 update: 0xc002ebeae0, {ConnectivityState:CONNECTING ConnectionError:<nil> connectedAddress:{Addr: ServerName: Attributes:<nil> BalancerAtt
ributes:<nil> Metadata:<nil>}}  {"grpc_log": true}
2024-10-27T09:57:43.754Z        info    grpc@v1.67.1/clientconn.go:1199 [core] [Channel #1 SubChannel #2]Subchannel Connectivity change to 
READY   {"grpc_log": true}
2024-10-27T09:57:43.754Z        info    pickfirst/pickfirst.go:176      [pick-first-lb] [pick-first-lb 0xc002ebea50] Received SubConn state
 update: 0xc002ebeae0, {ConnectivityState:READY ConnectionError:<nil> connectedAddress:{Addr:10.43.17.7:4317 ServerName:hera-signoz-otel-collector:4317 Attributes:<nil>BalancerAttributes:<nil> Metadata:<nil>}} {"grpc_log": true}
2024-10-27T09:57:43.755Z        info    grpc@v1.67.1/clientconn.go:544  [core] [Channel #1]Channel Connectivity change to READY {"grpc_log"
: true}
s
how often do you see context dealine exceeded errors?
a
Only once at the beginning.
I noticed that if I increase the timeout of the metrics config arbitrarily to 120s:
Copy code
processors:
      batch:
        timeout: 120s
I have no errors in the logs but no metrics either ! Any insights you might have would be greatly appreciated.
s
I had to go back to look at your configmap. The dashboard you are using doesn't work because the hostmetrics generated do not have k8s_* the attributes. How did you end up with that config for daemonset?
a
I copied the config from an example and edited the attributes to make sure I do set values for the
cluster_name
and
node_name
. Could you please point me to a relevant example ?
the hostmetrics generated do not have k8s_* the attributes.
Could you please clarify that sentence?
s
Why not just use our k8s-infra chart?
a
thanks for the recommendation. I saw indeed that it might be a better option only after I started with the full signoz helm chart. I will give it a try and report. I found the documented configutation a bit confusing: https://signoz.io/docs/tutorial/kubernetes-infra-metrics/
Copy code
global:
  cloud: others
  clusterName: <CLUSTER_NAME>
  deploymentEnvironment: <DEPLOYMENT_ENVIRONMENT>
otelCollectorEndpoint: ingest.{region}.signoz.cloud:443
otelInsecure: false
signozApiKey: <SIGNOZ_INGESTION_KEY>
presets:
  otlpExporter:
    enabled: true
  loggingExporter:
    enabled: false
please confirm that for a local on prem cluster like I have the
otelCollectorEndpoint
is the local service and that I do not need a
signozApiKey
.
s
Yes, you do not need api key.
The docs have two sections to avoid such confusions. The self-host section doesn't include any api key.
a
no big deal but it currently does:
s
You are looking at SigNoz Cloud instructions. Here is the self-host instructions.
a
thank you very much. I was looking at the wrong tab indeed. I will play with
K8s-Infra
and follow the
self-host
instuctions. your help was greatly appreciated.