Hi Team, I tried installing latest signoz version....
# support
g
Hi Team, I tried installing latest signoz version. I converted otel-agent into deployment. But, few pods are in initiation state. Any suggestions?
Copy code
NAME                                                        READY   STATUS     RESTARTS   AGE
my-release-clickhouse-operator-9fc495f79-6dls2              2/2     Running    0          10m
my-release-k8s-infra-otel-agent-69448c6665-j8pr2            1/1     Running    0          12m
my-release-k8s-infra-otel-deployment-d59f89c4b-wjkqg        1/1     Running    0          10m
my-release-signoz-alertmanager-0                            0/1     Init:0/1   0          4m17s
my-release-signoz-frontend-9c69c799c-gbpht                  0/1     Init:0/1   0          4m56s
my-release-signoz-otel-collector-67d7b9f948-rk9hk           0/1     Init:0/1   0          4m56s
my-release-signoz-otel-collector-metrics-76d9c6b876-9ztzd   0/1     Init:0/1   0          4m56s
my-release-signoz-query-service-0                           0/1     Init:0/1   0          4m20s
my-release-zookeeper-0                                      1/1     Running    0          9m50s
s
@Goutham Sridhar Please check your clickhouse server. It is stuck in init pods and init pods tries to make connection with clickhouse server. If it is successfull it will proceed otherwise it will stuck in init state
a
@Syed Muhammad Hassan, my clickhouse server is running my-release-clickhouse-operator-9fc495f79-6dls
g
@Syed Muhammad Hassan click house operator server is running fine and I have checked logs.
Copy code
I0421 13:58:09.204356       1 rest_server.go:38] Starting metrics exporter at ':8888/metrics'
I0421 13:58:09.219345       1 exporter.go:318] Add explicitly found CHI tempsignoz/my-release-clickhouse with 1 hosts
I0421 13:58:09.219370       1 exporter.go:158] Added ClickHouseInstallation (tempsignoz/my-release-clickhouse): including hostnames into Exporter
Copy code
I0421 13:58:06.978004       1 controller.go:497] Run():ClickHouseInstallation controller: starting workers number: 11
I0421 13:58:06.978031       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 1 out of 11
I0421 13:58:06.978052       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 2 out of 11
I0421 13:58:06.978067       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 3 out of 11
I0421 13:58:06.978088       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 4 out of 11
I0421 13:58:06.978110       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 5 out of 11
I0421 13:58:06.978161       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 6 out of 11
I0421 13:58:06.978182       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 7 out of 11
I0421 13:58:06.978198       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 8 out of 11
I0421 13:58:06.978322       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 9 out of 11
I0421 13:58:06.978343       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 10 out of 11
I0421 13:58:06.978360       1 controller.go:499] Run():ClickHouseInstallation controller: starting worker 11 out of 11
I0421 13:58:06.978388       1 controller.go:509] Run():ClickHouseInstallation controller: workers started
s
check the init container logs, it will tell you what it is trying to do
a
All init pods are showing same error
Copy code
Error from server (BadRequest): container "my-release-signoz-query-service" in pod "my-release-signoz-query-service-0" is waiting to start: PodInitializing
But i could see some errors from k8s otel agent
Copy code
2023-04-21T14:12:13.539Z	error	exporterhelper/queued_retry.go:175	Exporting failed. No more retries left. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "otlp", "error": "max elapsed time expired rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.90.13.186:4317: connect: connection refused\"", "dropped_items": 1056}
@Syed Muhammad Hassan, can we get any update on this ?
unable to setup this signoz. kindly help us out
s
What does the
describe pod
show?
g
All the pods are showing container is created and started. But pod is still initializing
s
I think there is problem with clickhouse server.
Copy code
connect: connection refused
a
how we gonna solve it
would you please help us in this ?
@Srikanth Chekuri / @Syed Muhammad Hassan, could you please help me on this ?
p
@Goutham Sridhar It looks like there is no clickhouse instance pod (chi-pod). can you run
helm upgrade
command with your override values?
it should spin up the chi pod, if that was terminated due to some reason.
n
@Goutham Sridhar Did you figure out a solution? If so, can you please lmk?
a
after upgrading helm , we are able to run the signoz.