hey Team, I need a help. I want to collect my aks ...
# support
m
hey Team, I need a help. I want to collect my aks cluser metrics. with the below i am seeing the names of the pods and the node but i can't see the memory/cpu usage etc... metrics. what could be wrong. thanks
kubeletstats:
collection_interval: 30s
auth_type: "serviceAccount"
endpoint: "https://${KUBELET_NODE}:10250"
insecure_skip_verify: true
extra_metadata_labels:
- container.id
- k8s.volume.type
metric_groups:
- node
- container
- pod
- volume
v
Hey Mick, You're missing the k8sclusterreceiver. Are you using our k8s-infra chart?
m
TH for your reply @Vibhu Pandey
Copy code
Here is my helm values: signoz:
  global:
    enabled: true
  k8s-infra:
    enabled: true
  otelCollector:
    resources:
      limits:
        memory: "4Gi"
        cpu: "4"
      requests:
        memory: "1Gi"
        cpu: "500m"
    serviceAccount:
      create: true
      name: "signoz-sandbox-otel-collector"
    clusterRole:
      create: true
      name: "signoz-sandbox-otel-collector"
      namespace: "signoz"
      rules:
      - apiGroups: [""]
        resources: ["pods", "services", "replicationcontrollers", "namespaces", "nodes", "resourcequotas", "nodes/stats", "nodes/stats/summary" ,"nodes/metrics", "nodes/proxy", "nodes/spec"]
        verbs: ["get", "list", "watch"]
      - apiGroups: ["apps"]
        resources: ["replicasets", "daemonsets", "statefulsets", "deployments"]
        verbs: ["get", "list", "watch"]
      - apiGroups: ["<http://metrics.k8s.io|metrics.k8s.io>"]
        resources: ["nodes", "pods"]
        verbs: ["get", "list", "watch"]
      - apiGroups: ["batch"]
        resources: ["jobs", "cronjobs"]
        verbs: ["get", "list", "watch"]
      - apiGroups: ["autoscaling"]
        resources: ["horizontalpodautoscalers"]
        verbs: ["get", "list", "watch"]

    clusterRoleBinding:
      create: true
      roleRef:
        apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
        kind: ClusterRole
        name: "signoz-sandbox-otel-collector"
      subjects:
      - kind: ServiceAccount
        name: "signoz-sandbox-otel-collector"
        namespace: "signoz"
    service:
      type: ClusterIP
      ports:
        - name: http
          port: 4317
          targetPort: 4317
        - name: http
          port: 4318
          targetPort: 4318
    
    config:
      receivers:
        prometheus:
          config:
            scrape_configs:
              - job_name: 'amper-pods-collector'
                scrape_interval: 60s
                kubernetes_sd_configs:
                  - role: pod
                relabel_configs:
                  - source_labels: [__meta_kubernetes_pod_annotation_signoz_io_scrape]
                    action: keep
                    regex: true
              - job_name: 'amper-nodes-collector'
                scrape_interval: 60s
                kubernetes_sd_configs:
                  - role: node
                relabel_configs:
                  - source_labels: [__meta_kubernetes_node_label_kubernetes_io_hostname]
                    target_label: instance
                  - action: labelmap
                    regex: __meta_kubernetes_node_label_(.+)
                  - source_labels: [__meta_kubernetes_node_name]
                    target_label: kubernetes_node_name
        otlp:
          protocols:
            grpc:
              endpoint: "0.0.0.0:4317"
            http:
              endpoint: "0.0.0.0:4318"
          
        k8s_cluster:
          auth_type: "serviceAccount"
          collection_interval: 30s
        
        kubeletstats:
          collection_interval: 30s
          auth_type: "serviceAccount"
          endpoint: "https://${KUBELET_NODE}:10250"
          insecure_skip_verify: true
          extra_metadata_labels:
            - container.id
            - k8s.volume.type
          metric_groups:
            - node 
            - container
            - pod
            - volume
    
      processors:
        batch: 
          timeout: 5s
          send_batch_size: 10000
        memory_limiter:
          limit_mib: 4096
          spike_limit_mib: 3072
          check_interval: 1s
      
      exporters:
        logging:
          loglevel: debug
        otlp:
          endpoint: "signoz-sandbox-otel-collector:4317"  
          tls:
            insecure: true
          compression: gzip
        clickhousemetricswrite:
          endpoint: "tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/${env:CLICKHOUSE_DATABASE}"
          resource_to_telemetry_conversion:
            enabled: true
          timeout: 15s
          retry_on_failure:
            enabled: true
            initial_interval: 5s
            max_interval: 30s
            max_elapsed_time: 300s
        metadataexporter:
          dsn: tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/signoz_metadata
          timeout: 10s
          tenant_id: ${env:TENANT_ID}
          cache:
            provider: in_memory
        
      service:
        pipelines:
          metrics:
            receivers: [prometheus, otlp, k8s_cluster, kubeletstats]
            processors: [memory_limiter , batch]
            exporters: [otlp, clickhousemetricswrite, metadataexporter]
        telemetry:
          logs:
            level: debug
          metrics:
            level: detailed
            address: ":8888"
v
Which chart version?
m
apiversion: v2 name: signoz version: 1.0.0 dependencies: - name: signoz version: 0.72.0 repository: https://charts.signoz.io
v
72 version perfect! Which version of k8s-infra are you running?
m
1.30
m
could that be the issue? kubectl version 1.30
or is it is the way i am querying
?
confused a bit
v
Right, sorry for confusing you, can you check which version of the signoz k8s-infra chart (https://github.com/SigNoz/charts/blob/main/charts/k8s-infra/Chart.yaml)? (Not your k8s version)
m
ok. I am trying to understand it. i have to have 2 chart.yaml
one for k8s-infra and another one for signoz
v
Yes that's correct
m
ah ok
will i deploy the second chart(k8s-infra) in the cluster
v
Once you install k8s-infra, everything should work out of the box for you!!
The defaults are very well set up!
m
will the k8s-infra send all the metrics to signoz
i will have to configure it to send metrics via values.yml
v
Yup by default all metrics, logs to signoz will be sent from your cluster(s). You can override certain things to stop sending metrics but it should work very well with the defaults.
Copy code
global:
  cloud: others
  clusterName: <CLUSTER_NAME>
  deploymentEnvironment: <DEPLOYMENT_ENVIRONMENT>
otelCollectorEndpoint: <IP-or-Endpoint-of-SigNoz-OtelCollector>:4317
otelInsecure: true
presets:
  otlpExporter:
    enabled: true
  loggingExporter:
    enabled: false
m
can't thank you enough. let me try it out.
v
Let me know how it works out 🙂
@Nagesh Bansal for context!
m
the otel-agent in my cluster are not sending the metrics to my signoz
haven't gotten the root cause as the logs are not quite explicit
just an update
@Vibhu Pandey could help me understand. i am passing this "`signoz-sandbox-otel-collector.signoz.svc.cluster.local:4317`" as an otelCollectorEndpoint but it is not picking
what could be the issue
i have k8s-infra and signoz deployed in the same namespace
i have k8s-infra well deployed in a aks cluster and it has an error on each node agent
logs:"`{"level":"info","ts":1742481736.503711,"caller":"exporterhelper/retry_sender.go:118","msg":"Exporting failed. Will retry the request after interval.","kind":"exporter","data_type":"logs","name":"otlp","error":"rpc error: code = Unavailable desc = name resolver error: produced zero addresses","interval":"16.136362546s"}` `{"level":"error","ts":1742481739.8696356,"caller":"exporterhelper/queue_sender.go:92","msg":"Exporting failed. Dropping data.","kind":"exporter","data_type":"metrics","name":"otlp","error":"no more retries left: rpc error: code = Unavailable desc = name resolver error: produced zero addresses","dropped_items":937,"stacktrace":"go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1\n\tgo.opentelemetry.io/collector/exporter@v0.109.0/exporterhelper/queue_sender.go:92\ngo.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume\n\tgo.opentelemetry.io/collector/exporter@v0.109.0/internal/queue/bounded_memory_queue.go:52\ngo.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1\n\tgo.opentelemetry.io/collector/exporter@v0.109.0/internal/queue/consumers.go:43"}`"
cc: @Nagesh Bansal