I'm having problems showing Kubernetes stats on th...
# support
c
I'm having problems showing Kubernetes stats on the dashboard. I followed the instructions on https://signoz.io/blog/kubernetes-monitoring/. Everything installed with no problem and is running. The "Monitor Kubelet Metrics" part also works fine. The problem is in displaying the node metrics (https://signoz.io/blog/kubernetes-monitoring/#monitor-node-metrics-of-your-kubernetes-cluster). I can generate the dashboard json file no problem and import them into the dashboard, however, it reports the collector is not up and there is no data (see screenshot). We have 2 nodes in our cluster and the monitors report as running: kubectl -n signoz-infra-metrics get pods NAME READY STATUS RESTARTS AGE otel-collector-agent-ghvgq 1/1 Running 0 178m otelcontribcol-6d45c844c-l7fpk 1/1 Running 0 177m otel-collector-agent-hbzwb 1/1 Running 0 178m How do I fix this?
p
It looks like SigNoz otel collector is down. Could you please share SigNoz OtelCollector logs?
c
I assume you're referring to the platform my-release-signoz-otel-collector? The pod is running: kubectl -n platform get pods NAME READY STATUS RESTARTS AGE clickhouse-operator-8cff468-f2hm8 2/2 Running 0 15d my-release-zookeeper-0 1/1 Running 0 15d chi-signoz-cluster-0-0-0 1/1 Running 0 15d my-release-signoz-query-service-0 1/1 Running 0 15d my-release-signoz-alertmanager-0 1/1 Running 0 15d my-release-signoz-frontend-6b7dbccbc7-fgbnv 1/1 Running 0 15d my-release-signoz-otel-collector-metrics-68bcfd5556-7tjks 1/1 Running 0 15d my-release-signoz-otel-collector-66c8c7dc9d-xqxbd 1/1 Running 0 15d Quite a few of these errors in my-release-signoz-otel-collector-66c8c7dc9d-xqxbd: time="2022-06-10T203203Z" level=error msg="dial tcp: i/o timeout" component=clickhouse time="2022-06-10T203208Z" level=error msg="dial tcp: i/o timeout" component=clickhouse 2022-06-10T203210.871Z info exporterhelper/queued_retry.go:215 Exporting failed. Will retry the request after interval. {"kind": "exporter", "name": "clickhousemetricswrite", "error": "dial tcp: i/o timeout", "errorVerbose": "dial tcp: i/o timeout\ngithub.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhousemetricsexporter.inTransaction\n\t/src/exporter/clickhousemetricsexporter/clickhouse.go:228\ngithub.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhousemetricsexporter.(*clickHouse).Write\n\t/src/exporter/clickhousemetricsexporter/clickhouse.go:308\ngithub.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhousemetricsexporter.(*PrwExporter).export.func1\n\t/src/exporter/clickhousemetricsexporter/exporter.go258\nruntime.goexit\n\t/usr/local/go/src/runtime/asm amd64.s1581", "interval": "237.851571ms"} time="2022-06-10T203213Z" level=error msg="dial tcp: i/o timeout" component=clickhouse
p
@Chris Ahern clickhouse pod is up and running and other pods ready.. So, it means it was accessible and healthy at one stage. Clickhouse pod logs would have helped to debug it better. Please do let me know if this issue still persists or already resolved.
c
I've since upgrades signoz to 0.9.2. Now either of these are working (I regenerated the json for each node). Any pointers? What do the uid's in the json refer to?
p
@Chris Ahern not clear what do you mean. are you able to see data in the above dashboard correctly?
also cc @Prashant Shahi
p
@Chris Ahern Did you follow the upgrade guide from our website? https://signoz.io/docs/operate/kubernetes/#upgrade
c
I ran the helm upgrade command, primarily based on the version specification in https://signoz.io/docs/install/kubernetes/others/.
I did read through the upgrade guide but didn't follow "Run the following command to install the chart version
0.0.8
running SigNoz version `0.6.2`". There was nothing matching this in the table above.
p
That part of the docs is actually little old, which needs to be updated.
However, it is recommended to use newer chart version
c
ok, thx. I'm seeing this: helm search repo signoz --versions NAME CHART VERSION APP VERSION DESCRIPTION signoz/signoz 0.0.13 0.8.0 SigNoz Observability Platform Helm Chart signoz/signoz 0.0.12 0.7.5 SigNoz Observability Platform Helm Chart signoz/signoz 0.0.11 0.7.4 SigNoz Observability Platform Helm Chart signoz/signoz 0.0.10 0.7.1 SigNoz Observability Platform Helm Chart signoz/signoz 0.0.9 0.7.1 SigNoz Observability Platform Helm Chart signoz/signoz 0.0.8 0.6.2 SigNoz Observability Platform Helm Chart signoz/signoz 0.0.7 0.6.1 SigNoz Observability Platform Helm Chart signoz/signoz 0.0.6 0.6.1 SigNoz Observability Platform Helm Chart signoz/signoz 0.0.5 0.6.0 SigNoz Observability Platform Helm Chart signoz/signoz 0.0.4 0.5.4 SigNoz Observability Platform Helm Chart signoz/signoz 0.0.3 0.5.4 SigNoz Observability Platform Helm Chart signoz/signoz 0.0.2 0.5.4 SigNoz Observability Platform Helm Chart signoz/alertmanager 0.5.2 0.5.0 The Alertmanager handles alerts for SigNoz. signoz/alertmanager 0.5.1 0.5.0 The Alertmanager handles alerts for SigNoz. signoz/alertmanager 0.5.0 0.5.0 The Alertmanager handles alerts for SigNoz. signoz/clickhouse 16.0.5 21.12.3.32 A Helm chart for ClickHouse signoz/clickhouse 16.0.4 21.12.3.32 A Helm chart for ClickHouse signoz/clickhouse 16.0.3 21.12.3.32 A Helm chart for ClickHouse signoz/clickhouse 16.0.2 21.12.3.32 A Helm chart for ClickHouse signoz/clickhouse 16.0.1 21.12.3.32 A Helm chart for ClickHouse signoz/clickhouse 16.0.0 21.12.3.32 A Helm chart for ClickHouse signoz/clickhouse 9.1.0 21.7 A Helm chart for ClickHouse
Is 16.0.5 is recommended version to upgrade to?
p
that's clickhouse charts version.
can you run the following?
Copy code
helm repo update

helm search repo signoz --versions
c
great - I ran that and see other versions: helm search repo signoz --versions NAME CHART VERSION APP VERSION DESCRIPTION signoz/signoz 0.2.5 0.10.2 SigNoz Observability Platform Helm Chart signoz/signoz 0.2.4 0.10.1 SigNoz Observability Platform Helm Chart signoz/signoz 0.2.3 0.10.1 SigNoz Observability Platform Helm Chart signoz/signoz 0.2.2 0.10.0 SigNoz Observability Platform Helm Chart signoz/signoz 0.2.1 0.10.0 SigNoz Observability Platform Helm Chart signoz/signoz 0.2.0 0.10.0 SigNoz Observability Platform Helm Chart signoz/signoz 0.1.4 0.9.2 SigNoz Observability Platform Helm Chart signoz/signoz 0.1.3 0.9.2 SigNoz Observability Platform Helm Chart signoz/signoz 0.1.2 0.9.2 SigNoz Observability Platform Helm Chart signoz/signoz 0.1.1 0.9.1 SigNoz Observability Platform Helm Chart signoz/signoz 0.1.0 0.9.0 SigNoz Observability Platform Helm Chart signoz/signoz 0.0.20 0.8.2 SigNoz Observability Platform Helm Chart signoz/signoz 0.0.19 0.8.2 SigNoz Observability Platform Helm Chart signoz/signoz 0.0.18 0.8.2 SigNoz Observability Platform Helm Chart signoz/signoz 0.0.17 0.8.1 SigNoz Observability Platform Helm Chart signoz/signoz 0.0.16 0.8.0 SigNoz Observability Platform Helm Chart signoz/signoz 0.0.15 0.8.0 SigNoz Observability Platform Helm Chart
What is the latest version we should use for production?
p
signoz/signoz
0.2.5
it by default points to latest signoz release:
0.10.2
Also, in case you have custom
overwrite-values.yaml
, refer to the default
values.yaml
from the latest release: https://github.com/SigNoz/charts/blob/signoz-0.2.5/charts/signoz/values.yaml
c
I ran the upgrade to 0.2.5 and got the following: helm -n platform upgrade my-release signoz/signoz --version 0.2.5 Release "my-release" has been upgraded. Happy Helming! NAME: my-release LAST DEPLOYED: Mon Aug 22 105304 2022 NAMESPACE: platform STATUS: deployed REVISION: 7 NOTES: 1. You have just deployed SigNoz cluster: - frontend version: '0.9.2' - query-service version: '0.9.2' - alertmanager version: '0.23.0-0.2' - otel-collector version: '0.45.1-1.3' - otel-collector-metrics version: '0.45.1-1.3'
Should it not have upgraded the frontend and query-service to 0.10.2 ?
p
which version of signoz were you using prior? 🤔
In case you wish to retain old data, there are series of migration steps: https://signoz.io/docs/operate/migration
which has to be upgraded one at a time.
c
Was on 0.8.0. Thx - I'll take a look at that. The kubernetes dashboards are showing data now.