I'm trying to scrape application pod (prometheus) metrics using the k8s-infra chart. I've not been a...
j
I'm trying to scrape application pod (prometheus) metrics using the k8s-infra chart. I've not been able to find much documenation, but I note that the following configuration exists https://github.com/SigNoz/charts/blob/main/charts/k8s-infra/values.yaml#L349-L352 , which is briefly being discussed at https://community-chat.signoz.io/t/8493962/hi-team-i-ve-installed-signoz-via-helm-chart-initially-there. Setting those labels on my pod exposing a prometheus page doesn't seem to get picked up though. Has anyone successfully scraped prometheus metrics with the setup in the k8s-infra chart?
s
@Jens Kristian Geyti Can you share the pod annotations you used?
j
My helmchart values are:
Copy code
apiVersion: <http://helm.toolkit.fluxcd.io/v2beta1|helm.toolkit.fluxcd.io/v2beta1>
kind: HelmRelease
metadata:
  name: signoz-collector
  namespace: kube-system
spec:
  values:
    global:
      clusterName: "staging"
    otelAgent:
      enabled: true
      podAnnotations:
        <http://signoz.io/scrape|signoz.io/scrape>: "true"
        <http://signoz.io/port|signoz.io/port>: "9000"
        <http://signoz.io/path|signoz.io/path>: "/metrics"
The application in my pod exposes metrics on
:9000/metrics
(confirmed with
kubectl port-forward
), but (perhaps crucially?) I'm not exposing that through a service or a containerPort. My pod is annotated with
Copy code
metadata:
  annotations:
    <http://signoz.io/path|signoz.io/path>: /metrics
    <http://signoz.io/port|signoz.io/port>: "9000"
    <http://signoz.io/scrape|signoz.io/scrape>: "true"
s
but (perhaps crucially?) I'm not exposing that through a service or a containerPort.
Yes, please resolve and let us know if you still don't see the expected metrics.
j
Pod spec:
Copy code
apiVersion: v1
kind: Pod
metadata:
  annotations:
    signoz.io/path: /metrics
    signoz.io/port: "9000"
    signoz.io/scrape: "true"
spec:
  containers:
  - ports:
    - containerPort: 9000
      name: metrics
      protocol: TCP
helm chart values as above. flipped all signoz pods for good measure. no luck - none of my metrics being proposed by the Metrics autocompletion under Alerts.
s
Can you restart the collector and check once?
j
already restarted all three
signoz-collector-k8s-infra-otel-agent-*
pods and the
signoz-collector-k8s-infra-otel-deployment-*
pod after making the changes.
s
Do you see any error logs in otel-deployment?
j
Copy code
2024-06-04T13:53:31.346Z	info	service@v0.88.0/telemetry.go:84	Setting up own telemetry...
2024-06-04T13:53:31.346Z	info	service@v0.88.0/telemetry.go:201	Serving Prometheus metrics	{"address": "0.0.0.0:8888", "level": "Basic"}
2024-06-04T13:53:31.348Z	info	service@v0.88.0/service.go:143	Starting otelcol-contrib...	{"Version": "0.88.0", "NumCPU": 4}
2024-06-04T13:53:31.348Z	info	extensions/extensions.go:33	Starting extensions...
2024-06-04T13:53:31.348Z	info	extensions/extensions.go:36	Extension is starting...	{"kind": "extension", "name": "health_check"}
2024-06-04T13:53:31.348Z	info	healthcheckextension@v0.88.0/healthcheckextension.go:35	Starting health_check extension	{"kind": "extension", "name": "health_check", "config": {"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2024-06-04T13:53:31.348Z	warn	internal@v0.88.0/warning.go:40	Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks	{"kind": "extension", "name": "health_check", "documentation": "<https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks>"}
2024-06-04T13:53:31.348Z	info	extensions/extensions.go:43	Extension started.	{"kind": "extension", "name": "health_check"}
2024-06-04T13:53:31.348Z	info	extensions/extensions.go:36	Extension is starting...	{"kind": "extension", "name": "zpages"}
2024-06-04T13:53:31.348Z	info	zpagesextension@v0.88.0/zpagesextension.go:53	Registered zPages span processor on tracer provider	{"kind": "extension", "name": "zpages"}
2024-06-04T13:53:31.348Z	info	zpagesextension@v0.88.0/zpagesextension.go:63	Registered Host's zPages	{"kind": "extension", "name": "zpages"}
2024-06-04T13:53:31.349Z	info	zpagesextension@v0.88.0/zpagesextension.go:75	Starting zPages extension	{"kind": "extension", "name": "zpages", "config": {"TCPAddr":{"Endpoint":"localhost:55679"}}}
2024-06-04T13:53:31.349Z	info	extensions/extensions.go:43	Extension started.	{"kind": "extension", "name": "zpages"}
2024-06-04T13:53:31.349Z	info	extensions/extensions.go:36	Extension is starting...	{"kind": "extension", "name": "pprof"}
2024-06-04T13:53:31.349Z	info	pprofextension@v0.88.0/pprofextension.go:60	Starting net/http/pprof server	{"kind": "extension", "name": "pprof", "config": {"TCPAddr":{"Endpoint":"localhost:1777"},"BlockProfileFraction":0,"MutexProfileFraction":0,"SaveToFile":""}}
2024-06-04T13:53:31.349Z	info	extensions/extensions.go:43	Extension started.	{"kind": "extension", "name": "pprof"}
2024-06-04T13:53:31.350Z	info	internal/resourcedetection.go:125	began detecting resource information	{"kind": "processor", "name": "resourcedetection/internal", "pipeline": "metrics/internal"}
2024-06-04T13:53:31.350Z	info	internal/resourcedetection.go:139	detected resource information	{"kind": "processor", "name": "resourcedetection/internal", "pipeline": "metrics/internal", "resource": {"k8s.cluster.name":"staging","signoz.component":"otel-deployment"}}
2024-06-04T13:53:31.448Z	info	healthcheck/handler.go:132	Health Check state change	{"kind": "extension", "name": "health_check", "status": "ready"}
2024-06-04T13:53:31.448Z	info	service@v0.88.0/service.go:169	Everything is ready. Begin running and processing data.
2024-06-04T13:53:31.448Z	info	k8sclusterreceiver@v0.88.0/receiver.go:53	Starting shared informers and wait for initial cache sync.	{"kind": "receiver", "name": "k8s_cluster", "data_type": "metrics"}
2024-06-04T13:53:31.666Z	info	k8sclusterreceiver@v0.88.0/receiver.go:74	Completed syncing shared informer caches.	{"kind": "receiver", "name": "k8s_cluster", "data_type": "metrics"}
looks happy enough. I also have the k8s-infra cluster metrics coming through fine.
s
there must be a container with name
otel-collector-metrics
, are these the logs of it? If not, can you share them as well?
j
well then we're probably onto something! thre's no otel-collector-metrics. Only
signoz-collector-k8s-infra
. Let me just double check exactly which helm chart (and version) we're running. edit: 0.11.5 (appVersion 0.88.0)
s
I see what is happening.
🙌 1
So the prometheus metrics collector is not part of the k8s-infra chart but part of the main signoz chart. Let me share an override-values you can try.
j
I see 🙂 The signoz helm chart is the one for self-hosting with clickhouse etc, right? We'd quite like to use signoz cloud, so a way to sneak in the (prometheus) metrics collector without running the full stack would be awesome! Especially if we can do it as part of the existing k8s-metrics chart!
s
Please use this for override values and application metrics should be collected.
Copy code
otelDeployment:
  config:
    receivers:
      prometheus:
        config:
          scrape_configs:
          - job_name: pod_metrics
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_pod_annotation_signoz_io_scrape
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_pod_annotation_signoz_io_path
              target_label: __metrics_path__
            - action: replace
              separator: ':'
              source_labels:
              - __meta_kubernetes_pod_ip
              - __meta_kubernetes_pod_annotation_signoz_io_port
              target_label: __address__
            - replacement: pod_metrics
              target_label: job_name
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: k8s_namespace_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: k8s_pod_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_uid
              target_label: k8s_pod_uid
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_node_name
              target_label: k8s_node_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_ready
              target_label: k8s_pod_ready
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_phase
              target_label: k8s_pod_phase
            scrape_interval: 60s
    service:
      pipelines:
        metrics/:
          receivers: [prometheus]
          processors: [batch]
          exporters: []
Also since you are a cloud customer, I encourage you to ask questions on Intercom because sometimes questions on community support go unnoticed.
j
noted! (just rolling out the change)
I assume it was meant to be
Copy code
service:
      pipelines:
        metrics/internal:
          ...
, right?
s
Ah sorry you are correct
👍 1
j
star! that works 🎉 getting a custom configuration produced is really going above and beyond. amazing support! thanks so much! would you consider making this a feature of the k8s-infra chart? seems to me like an obvious feature to be able to collect metrics from the cluster, but you obviously know your customer base better!
s
Sure, it makes sense and we have had some customers who used this. If you create an issue here https://github.com/SigNoz/charts that would be great (it carries more weight).
j
Awesome, will do. Do you want me to mention you or your github handle (if so, what is it) when I share a summary of the above?
s
please use @srikanthccv
👍 1
j
Will do. Thanks again. Really impressed with the level of support 🙂
https://github.com/SigNoz/charts/issues/445 . All sorted. Have a great day!
s
thanks