I'm trying to scrape application pod (prometheus) metrics using the k8s-infra chart. I've not been a...

Jens Kristian Geyti

06/03/2024, 10:59 AM

I'm trying to scrape application pod (prometheus) metrics using the k8s-infra chart. I've not been able to find much documenation, but I note that the following configuration exists https://github.com/SigNoz/charts/blob/main/charts/k8s-infra/values.yaml#L349-L352 , which is briefly being discussed at https://community-chat.signoz.io/t/8493962/hi-team-i-ve-installed-signoz-via-helm-chart-initially-there. Setting those labels on my pod exposing a prometheus page doesn't seem to get picked up though. Has anyone successfully scraped prometheus metrics with the setup in the k8s-infra chart?

Srikanth Chekuri

06/03/2024, 4:55 PM

@Jens Kristian Geyti Can you share the pod annotations you used?

Jens Kristian Geyti

06/04/2024, 1:43 PM

My helmchart values are:

Copy code

apiVersion: <http://helm.toolkit.fluxcd.io/v2beta1|helm.toolkit.fluxcd.io/v2beta1>
kind: HelmRelease
metadata:
  name: signoz-collector
  namespace: kube-system
spec:
  values:
    global:
      clusterName: "staging"
    otelAgent:
      enabled: true
      podAnnotations:
        <http://signoz.io/scrape|signoz.io/scrape>: "true"
        <http://signoz.io/port|signoz.io/port>: "9000"
        <http://signoz.io/path|signoz.io/path>: "/metrics"

The application in my pod exposes metrics on

:9000/metrics

(confirmed with

kubectl port-forward

), but (perhaps crucially?) I'm not exposing that through a service or a containerPort. My pod is annotated with

Copy code

metadata:
  annotations:
    <http://signoz.io/path|signoz.io/path>: /metrics
    <http://signoz.io/port|signoz.io/port>: "9000"
    <http://signoz.io/scrape|signoz.io/scrape>: "true"

Srikanth Chekuri

06/04/2024, 1:45 PM

but (perhaps crucially?) I'm not exposing that through a service or a containerPort.

Yes, please resolve and let us know if you still don't see the expected metrics.

Jens Kristian Geyti

06/04/2024, 1:55 PM

Pod spec:

Copy code

apiVersion: v1
kind: Pod
metadata:
  annotations:
    signoz.io/path: /metrics
    signoz.io/port: "9000"
    signoz.io/scrape: "true"
spec:
  containers:
  - ports:
    - containerPort: 9000
      name: metrics
      protocol: TCP

helm chart values as above. flipped all signoz pods for good measure. no luck - none of my metrics being proposed by the Metrics autocompletion under Alerts.

Srikanth Chekuri

06/04/2024, 1:56 PM

Can you restart the collector and check once?

Jens Kristian Geyti

06/04/2024, 1:57 PM

already restarted all three

signoz-collector-k8s-infra-otel-agent-*

pods and the

signoz-collector-k8s-infra-otel-deployment-*

pod after making the changes.

Srikanth Chekuri

06/04/2024, 1:59 PM

Do you see any error logs in otel-deployment?

Jens Kristian Geyti

06/04/2024, 2:00 PM

Copy code

2024-06-04T13:53:31.346Z	info	service@v0.88.0/telemetry.go:84	Setting up own telemetry...
2024-06-04T13:53:31.346Z	info	service@v0.88.0/telemetry.go:201	Serving Prometheus metrics	{"address": "0.0.0.0:8888", "level": "Basic"}
2024-06-04T13:53:31.348Z	info	service@v0.88.0/service.go:143	Starting otelcol-contrib...	{"Version": "0.88.0", "NumCPU": 4}
2024-06-04T13:53:31.348Z	info	extensions/extensions.go:33	Starting extensions...
2024-06-04T13:53:31.348Z	info	extensions/extensions.go:36	Extension is starting...	{"kind": "extension", "name": "health_check"}
2024-06-04T13:53:31.348Z	info	healthcheckextension@v0.88.0/healthcheckextension.go:35	Starting health_check extension	{"kind": "extension", "name": "health_check", "config": {"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2024-06-04T13:53:31.348Z	warn	internal@v0.88.0/warning.go:40	Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks	{"kind": "extension", "name": "health_check", "documentation": "<https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks>"}
2024-06-04T13:53:31.348Z	info	extensions/extensions.go:43	Extension started.	{"kind": "extension", "name": "health_check"}
2024-06-04T13:53:31.348Z	info	extensions/extensions.go:36	Extension is starting...	{"kind": "extension", "name": "zpages"}
2024-06-04T13:53:31.348Z	info	zpagesextension@v0.88.0/zpagesextension.go:53	Registered zPages span processor on tracer provider	{"kind": "extension", "name": "zpages"}
2024-06-04T13:53:31.348Z	info	zpagesextension@v0.88.0/zpagesextension.go:63	Registered Host's zPages	{"kind": "extension", "name": "zpages"}
2024-06-04T13:53:31.349Z	info	zpagesextension@v0.88.0/zpagesextension.go:75	Starting zPages extension	{"kind": "extension", "name": "zpages", "config": {"TCPAddr":{"Endpoint":"localhost:55679"}}}
2024-06-04T13:53:31.349Z	info	extensions/extensions.go:43	Extension started.	{"kind": "extension", "name": "zpages"}
2024-06-04T13:53:31.349Z	info	extensions/extensions.go:36	Extension is starting...	{"kind": "extension", "name": "pprof"}
2024-06-04T13:53:31.349Z	info	pprofextension@v0.88.0/pprofextension.go:60	Starting net/http/pprof server	{"kind": "extension", "name": "pprof", "config": {"TCPAddr":{"Endpoint":"localhost:1777"},"BlockProfileFraction":0,"MutexProfileFraction":0,"SaveToFile":""}}
2024-06-04T13:53:31.349Z	info	extensions/extensions.go:43	Extension started.	{"kind": "extension", "name": "pprof"}
2024-06-04T13:53:31.350Z	info	internal/resourcedetection.go:125	began detecting resource information	{"kind": "processor", "name": "resourcedetection/internal", "pipeline": "metrics/internal"}
2024-06-04T13:53:31.350Z	info	internal/resourcedetection.go:139	detected resource information	{"kind": "processor", "name": "resourcedetection/internal", "pipeline": "metrics/internal", "resource": {"k8s.cluster.name":"staging","signoz.component":"otel-deployment"}}
2024-06-04T13:53:31.448Z	info	healthcheck/handler.go:132	Health Check state change	{"kind": "extension", "name": "health_check", "status": "ready"}
2024-06-04T13:53:31.448Z	info	service@v0.88.0/service.go:169	Everything is ready. Begin running and processing data.
2024-06-04T13:53:31.448Z	info	k8sclusterreceiver@v0.88.0/receiver.go:53	Starting shared informers and wait for initial cache sync.	{"kind": "receiver", "name": "k8s_cluster", "data_type": "metrics"}
2024-06-04T13:53:31.666Z	info	k8sclusterreceiver@v0.88.0/receiver.go:74	Completed syncing shared informer caches.	{"kind": "receiver", "name": "k8s_cluster", "data_type": "metrics"}

looks happy enough. I also have the k8s-infra cluster metrics coming through fine.

Srikanth Chekuri

06/04/2024, 2:02 PM

there must be a container with name

otel-collector-metrics

, are these the logs of it? If not, can you share them as well?

Jens Kristian Geyti

06/04/2024, 2:03 PM

well then we're probably onto something! thre's no otel-collector-metrics. Only

signoz-collector-k8s-infra

. Let me just double check exactly which helm chart (and version) we're running. edit: 0.11.5 (appVersion 0.88.0)

Srikanth Chekuri

06/04/2024, 2:04 PM

I see what is happening.

🙌 1

Srikanth Chekuri

06/04/2024, 2:06 PM

So the prometheus metrics collector is not part of the k8s-infra chart but part of the main signoz chart. Let me share an override-values you can try.

Jens Kristian Geyti

06/04/2024, 2:10 PM

I see 🙂 The signoz helm chart is the one for self-hosting with clickhouse etc, right? We'd quite like to use signoz cloud, so a way to sneak in the (prometheus) metrics collector without running the full stack would be awesome! Especially if we can do it as part of the existing k8s-metrics chart!

Srikanth Chekuri

06/04/2024, 2:14 PM

Please use this for override values and application metrics should be collected.

Copy code

otelDeployment:
  config:
    receivers:
      prometheus:
        config:
          scrape_configs:
          - job_name: pod_metrics
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_pod_annotation_signoz_io_scrape
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_pod_annotation_signoz_io_path
              target_label: __metrics_path__
            - action: replace
              separator: ':'
              source_labels:
              - __meta_kubernetes_pod_ip
              - __meta_kubernetes_pod_annotation_signoz_io_port
              target_label: __address__
            - replacement: pod_metrics
              target_label: job_name
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: k8s_namespace_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: k8s_pod_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_uid
              target_label: k8s_pod_uid
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_node_name
              target_label: k8s_node_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_ready
              target_label: k8s_pod_ready
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_phase
              target_label: k8s_pod_phase
            scrape_interval: 60s
    service:
      pipelines:
        metrics/:
          receivers: [prometheus]
          processors: [batch]
          exporters: []

Srikanth Chekuri

06/04/2024, 2:21 PM

Also since you are a cloud customer, I encourage you to ask questions on Intercom because sometimes questions on community support go unnoticed.

Jens Kristian Geyti

06/04/2024, 2:21 PM

noted! (just rolling out the change)

Jens Kristian Geyti

06/04/2024, 2:23 PM

I assume it was meant to be

Copy code

service:
      pipelines:
        metrics/internal:
          ...

, right?

Srikanth Chekuri

06/04/2024, 2:23 PM

Ah sorry you are correct

👍 1

Jens Kristian Geyti

06/04/2024, 2:32 PM

star! that works 🎉 getting a custom configuration produced is really going above and beyond. amazing support! thanks so much! would you consider making this a feature of the k8s-infra chart? seems to me like an obvious feature to be able to collect metrics from the cluster, but you obviously know your customer base better!

Srikanth Chekuri

06/04/2024, 2:35 PM

Sure, it makes sense and we have had some customers who used this. If you create an issue here https://github.com/SigNoz/charts that would be great (it carries more weight).

Jens Kristian Geyti

06/04/2024, 2:36 PM

Awesome, will do. Do you want me to mention you or your github handle (if so, what is it) when I share a summary of the above?

Srikanth Chekuri

06/04/2024, 2:37 PM

please use @srikanthccv

👍 1

Jens Kristian Geyti

06/04/2024, 2:37 PM

Will do. Thanks again. Really impressed with the level of support 🙂

Jens Kristian Geyti

06/04/2024, 2:44 PM

https://github.com/SigNoz/charts/issues/445 . All sorted. Have a great day!

Srikanth Chekuri

06/04/2024, 2:50 PM

thanks

76 Views

Open in Slack

Previous Next

SigNoz Community

SigNoz is an open-source APM. It helps developers monitor their applications & troubleshoot problems, an open-source alternative to DataDog, NewRelic, etc.