https://signoz.io logo
Title
g

Goutham Sridhar

04/21/2023, 11:20 AM
Hi Team, I tried installing latest signoz version. I converted otel-agent into deployment. But, few pods are in initiation state. Any suggestions?
NAME                                                        READY   STATUS     RESTARTS   AGE
my-release-clickhouse-operator-9fc495f79-6dls2              2/2     Running    0          10m
my-release-k8s-infra-otel-agent-69448c6665-j8pr2            1/1     Running    0          12m
my-release-k8s-infra-otel-deployment-d59f89c4b-wjkqg        1/1     Running    0          10m
my-release-signoz-alertmanager-0                            0/1     Init:0/1   0          4m17s
my-release-signoz-frontend-9c69c799c-gbpht                  0/1     Init:0/1   0          4m56s
my-release-signoz-otel-collector-67d7b9f948-rk9hk           0/1     Init:0/1   0          4m56s
my-release-signoz-otel-collector-metrics-76d9c6b876-9ztzd   0/1     Init:0/1   0          4m56s
my-release-signoz-query-service-0                           0/1     Init:0/1   0          4m20s
my-release-zookeeper-0                                      1/1     Running    0          9m50s
s

Syed Muhammad Hassan

04/21/2023, 1:45 PM
@Goutham Sridhar Please check your clickhouse server. It is stuck in init pods and init pods tries to make connection with clickhouse server. If it is successfull it will proceed otherwise it will stuck in init state
a

Anil Kumar Bandrapalli

04/21/2023, 2:04 PM
@Syed Muhammad Hassan, my clickhouse server is running my-release-clickhouse-operator-9fc495f79-6dls
g

Goutham Sridhar

04/21/2023, 2:04 PM
@Syed Muhammad Hassan click house operator server is running fine and I have checked logs.
I0421 13:58:09.204356       1 rest_server.go:38] Starting metrics exporter at ':8888/metrics'
I0421 13:58:09.219345       1 exporter.go:318] Add explicitly found CHI tempsignoz/my-release-clickhouse with 1 hosts
I0421 13:58:09.219370       1 exporter.go:158] Added ClickHouseInstallation (tempsignoz/my-release-clickhouse): including hostnames into Exporter
I0421 13:58:06.978004       1 controller.go:497] Run():ClickHouseInstallation controller: starting workers number: 11
I0421 13:58:06.978031       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 1 out of 11
I0421 13:58:06.978052       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 2 out of 11
I0421 13:58:06.978067       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 3 out of 11
I0421 13:58:06.978088       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 4 out of 11
I0421 13:58:06.978110       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 5 out of 11
I0421 13:58:06.978161       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 6 out of 11
I0421 13:58:06.978182       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 7 out of 11
I0421 13:58:06.978198       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 8 out of 11
I0421 13:58:06.978322       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 9 out of 11
I0421 13:58:06.978343       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 10 out of 11
I0421 13:58:06.978360       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 11 out of 11
I0421 13:58:06.978388       1 controller.go:509] Run():ClickHouseInstallation controller: workers started
s

Syed Muhammad Hassan

04/21/2023, 2:05 PM
check the init container logs, it will tell you what it is trying to do
a

Anil Kumar Bandrapalli

04/21/2023, 2:09 PM
All init pods are showing same error
Error from server (BadRequest): container "my-release-signoz-query-service" in pod "my-release-signoz-query-service-0" is waiting to start: PodInitializing
But i could see some errors from k8s otel agent
2023-04-21T14:12:13.539Z	error	exporterhelper/queued_retry.go:175	Exporting failed. No more retries left. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "otlp", "error": "max elapsed time expired rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.90.13.186:4317: connect: connection refused\"", "dropped_items": 1056}
@Syed Muhammad Hassan, can we get any update on this ?
unable to setup this signoz. kindly help us out
s

Srikanth Chekuri

04/21/2023, 3:23 PM
What does the
describe pod
show?
g

Goutham Sridhar

04/21/2023, 4:14 PM
All the pods are showing container is created and started. But pod is still initializing
s

Syed Muhammad Hassan

04/21/2023, 8:26 PM
I think there is problem with clickhouse server.
connect: connection refused
a

Anil Kumar Bandrapalli

04/21/2023, 9:05 PM
how we gonna solve it
would you please help us in this ?
@Srikanth Chekuri / @Syed Muhammad Hassan, could you please help me on this ?
p

Prashant Shahi

04/25/2023, 8:26 AM
@Goutham Sridhar It looks like there is no clickhouse instance pod (chi-pod). can you run
helm upgrade
command with your override values?
it should spin up the chi pod, if that was terminated due to some reason.