#signoz <#C01HWQ1R0BC|support> #installation I'm ...
# general
v
#signoz #support #installation I'm doing a setup in the AKS of signoz i've used below helm charts helm repo add signoz https://charts.signoz.io changed the namespace to signoz and used our own standalone clickhouse cluster setup along with zookeeper Post installation i'm obeserving below issues in OTEL_COLLECTOR Error is below PS D:\Signoz> kubectl logs po/signoz-otel-collector-65fbf66f4f-zdntv -n signoz Defaulted container "signoz-otel-collector" out of: signoz-otel-collector, signoz-otel-collector-init (init) 2023-03-17T105829.571Z info service/telemetry.go:111 Setting up own telemetry... 2023-03-17T105829.572Z info service/telemetry.go:141 Serving Prometheus metrics {"address": "0.0.0.0:8888", "level": "Basic"} 2023-03-17T105829.572Z info components/components.go:30 Stability level of component is undefined {"kind": "exporter", "data_type": "metrics", "name": "clickhousemetricswrite", "stability": "Undefined"} time="2023-03-17T105829Z" level=info msg="Executing:\nCREATE DATABASE IF NOT EXISTS signoz_metrics ON CLUSTER cluster\n" component=clickhouse time="2023-03-17T105829Z" level=info msg="Executing:\nCREATE TABLE IF NOT EXISTS signoz_metrics.samples_v2 ON CLUSTER cluster (\n\t\t\tmetric_name LowCardinality(String),\n\t\t\tfingerprint UInt64 Codec(DoubleDelta, LZ4),\n\t\t\ttimestamp_ms Int64 Codec(DoubleDelta, LZ4),\n\t\t\tvalue Float64 Codec(Gorilla, LZ4)\n\t\t)\n\t\tENGINE = MergeTree\n\t\t\tPARTITION BY toDate(timestamp_ms / 1000)\n\t\t\tORDER BY (metric_name, fingerprint, timestamp_ms)\n\t\t\tTTL toDateTime(timestamp_ms/1000) + INTERVAL 2592000 SECOND DELETE;\n" component=clickhouse time="2023-03-17T105829Z" level=info msg="Executing:\nCREATE TABLE IF NOT EXISTS signoz_metrics.distributed_samples_v2 ON CLUSTER cluster AS signoz_metrics.samples_v2 ENGINE = Distributed(\"cluster\", \"signoz_metrics\", samples_v2, cityHash64(metric_name, fingerprint));\n" component=clickhouse time="2023-03-17T105830Z" level=info msg="Executing:\nALTER TABLE signoz_metrics.samples_v2 ON CLUSTER cluster MODIFY SETTING ttl_only_drop_parts = 1;\n" component=clickhouse time="2023-03-17T105830Z" level=info msg="Executing:\nSET allow_experimental_object_type = 1\n" component=clickhouse time="2023-03-17T105830Z" level=info msg="Executing:\nCREATE TABLE IF NOT EXISTS signoz_metrics.time_series_v2 ON CLUSTER cluster(\n\t\t\tmetric_name LowCardinality(String),\n\t\t\tfingerprint UInt64 Codec(DoubleDelta, LZ4),\n\t\t\ttimestamp_ms Int64 Codec(DoubleDelta, LZ4),\n\t\t\tlabels String Codec(ZSTD(5))\n\t\t)\n\t\tENGINE = ReplacingMergeTree\n\t\t\tPARTITION BY toDate(timestamp_ms / 1000)\n\t\t\tORDER BY (metric_name, fingerprint)\n\t\t\tTTL toDateTime(timestamp_ms/1000) + INTERVAL 2592000 SECOND DELETE;\n" component=clickhouse time="2023-03-17T105830Z" level=info msg="Executing:\nCREATE TABLE IF NOT EXISTS signoz_metrics.distributed_time_series_v2 ON CLUSTER cluster AS signoz_metrics.time_series_v2 ENGINE = Distributed(\"cluster\", signoz_metrics, time_series_v2, cityHash64(metric_name, fingerprint));\n" component=clickhouse time="2023-03-17T105830Z" level=info msg="Executing:\nALTER TABLE signoz_metrics.time_series_v2 ON CLUSTER cluster DROP COLUMN IF EXISTS labels_object\n" component=clickhouse time="2023-03-17T105830Z" level=info msg="Executing:\nALTER TABLE signoz_metrics.distributed_time_series_v2 ON CLUSTER cluster DROP COLUMN IF EXISTS labels_object\n" component=clickhouse time="2023-03-17T105830Z" level=info msg="Executing:\nALTER TABLE signoz_metrics.time_series_v2 ON CLUSTER cluster MODIFY SETTING ttl_only_drop_parts = 1;\n" component=clickhouse 2023-03-17T105831.476Z info kube/client.go:101 k8s filtering {"kind": "processor", "name": "k8sattributes", "pipeline": "metrics", "labelSelector": "", "fieldSelector": "spec.nodeName=aks-nodepool1-18518278-vmss00000f"} 2023-03-17T105831.477Z info components/components.go:30 Stability level of component is undefined {"kind": "exporter", "data_type": "traces", "name": "clickhousetraces", "stability": "Undefined"} 2023-03-17T105831.676Z info clickhousetracesexporter/clickhouse_factory.go:146 Patching views {"kind": "exporter", "data_type": "traces", "name": "clickhousetraces"} 2023-03-17T105832.786Z info clickhousetracesexporter/clickhouse_factory.go:116 Running migrations from path: {"kind": "exporter", "data_type": "traces", "name": "clickhousetraces", "test": "/migrations"} 2023-03-17T105841.160Z info clickhousetracesexporter/clickhouse_factory.go:128 Clickhouse Migrate finished {"kind": "exporter", "data_type": "traces", "name": "clickhousetraces"} Error: cannot build pipelines: failed to create "clickhousetraces" exporter, in pipeline "traces": code: 62, message: Syntax error: failed at position 2290 (')') (line 43, col 3): ) ENGINE MergeTree() PARTITION BY toDate(timestamp) ORDER BY (durationNano, timestamp) TTL toDateTime(timestamp) + INTERVAL 604800 SECOND DELETE SETTING. Expected one of: table property (column, index, constraint) declaration, INDEX, CONSTRAINT, PROJECTION, PRIMARY KEY, column declaration, identifier 2023/03/17 105841 application run finished with error: cannot build pipelines: failed to create "clickhousetraces" exporter, in pipeline "traces": code: 62, message: Syntax error: failed at position 2290 (')') (line 43, col 3): ) ENGINE MergeTree() PARTITION BY toDate(timestamp) ORDER BY (durationNano, timestamp) TTL toDateTime(timestamp) + INTERVAL 604800 SECOND DELETE SETTING. Expected one of: table property (column, index, constraint) declaration, INDEX, CONSTRAINT, PROJECTION, PRIMARY KEY, column declaration, identifier need help to understand this at the earliest please
s
What’s the ClickHouse version?
v
@Srikanth Chekuri - name: clickhouse:19.6 spec: containers: - name: clickhouse-pod image: clickhouse/clickhouse-server:22.3
s
What version of SigNoz are you running? The current version requires
22.8
v
@Srikanth Chekuri Signoz below versions I'm using from the latest helm chart This is latest version of signoz
s
Did you make any changes to charts?
v
No apart from adding my external clickhouse db details and zookeeper details nothing else @Srikanth Chekuri
s
That’s what I meant to ask, did you add custom ClickHouse details, Each of our version has some compatible version we provide, for
v0.17.0
it is
22.8
and you are using a lower version we no longer support. Look at the tag here https://github.com/SigNoz/charts/blob/7bf7768df013dd5968c0b80de06ca3c3ab651f70/charts/signoz/values.yaml#L109
v
So i have to upgrade my clickhouse version? @Srikanth Chekuri
s
Yes, please use version
22.8.8
or later and it should work.
v
Ok will check this and will get back thank you
@Srikanth Chekuri tried with 22.8.8 now otel collector and otel collector metrics both are crashing with below errors PS D:\Signoz> kubectl logs po/signoz-otel-collector-57d5b59fdb-wcft9 -n signoz Defaulted container "signoz-otel-collector" out of: signoz-otel-collector, signoz-otel-collector-init (init) 2023-03-19T025325.026Z info service/telemetry.go:111 Setting up own telemetry... 2023-03-19T025325.027Z info service/telemetry.go:141 Serving Prometheus metrics {"address": "0.0.0.0:8888", "level": "Basic"} 2023-03-19T025326.206Z info clickhouselogsexporter/exporter.go:356 Running migrations from path: {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "test": "/logsmigrations"} Error: cannot build pipelines: failed to create "clickhouselogsexporter" exporter, in pipeline "logs": cannot configure clickhouse logs exporter: clickhouse Migrate failed to run, error: migration failed in line 0: CREATE TABLE IF NOT EXISTS signoz_logs.distributed_logs_atrribute_keys ON CLUSTER cluster AS signoz_logs.logs_atrribute_keys ENGINE = Distributed("cluster", "signoz_logs", logs_atrribute_keys, cityHash64(datatype)); (details: code: 60, message: Table signoz_logs.logs_atrribute_keys doesn't exist) 2023/03/19 025327 application run finished with error: cannot build pipelines: failed to create "clickhouselogsexporter" exporter, in pipeline "logs": cannot configure clickhouse logs exporter: clickhouse Migrate failed to run, error: migration failed in line 0: CREATE TABLE IF NOT EXISTS signoz_logs.distributed_logs_atrribute_keys ON CLUSTER cluster AS signoz_logs.logs_atrribute_keys ENGINE = Distribu PS D:\Signoz> kubectl logs po/signoz-otel-collector-metrics-64668cd69f-rxssf -n signoz Defaulted container "signoz-otel-collector-metrics" out of: signoz-otel-collector-metrics, signoz-otel-collector-metrics-init (init) 2023-03-19T025614.708Z info service/telemetry.go:111 Setting up own telemetry... 2023-03-19T025614.709Z info service/telemetry.go:141 Serving Prometheus metrics {"address": "0.0.0.0:8888", "level": "Basic"} 2023-03-19T025614.709Z info components/components.go:30 Stability level of component is undefined {"kind": "exporter", "data_type": "metrics", "name": "clickhousemetricswrite/hostmetrics", "stability": "Undefined"} time="2023-03-19T025614Z" level=info msg="Executing:\nCREATE DATABASE IF NOT EXISTS signoz_metrics ON CLUSTER cluster\n" component=clickhouse time="2023-03-19T025614Z" level=info msg="Executing:\nCREATE TABLE IF NOT EXISTS signoz_metrics.samples_v2 ON CLUSTER cluster (\n\t\t\tmetric_name LowCardinality(String),\n\t\t\tfingerprint UInt64 Codec(DoubleDelta, LZ4),\n\t\t\ttimestamp_ms Int64 Codec(DoubleDelta, LZ4),\n\t\t\tvalue Float64 Codec(Gorilla, LZ4)\n\t\t)\n\t\tENGINE = MergeTree\n\t\t\tPARTITION BY toDate(timestamp_ms / 1000)\n\t\t\tORDER BY (metric_name, fingerprint, timestamp_ms)\n\t\t\tTTL toDateTime(timestamp_ms/1000) + INTERVAL 2592000 SECOND DELETE;\n" component=clickhouse time="2023-03-19T025614Z" level=info msg="Executing:\nCREATE TABLE IF NOT EXISTS signoz_metrics.distributed_samples_v2 ON CLUSTER cluster AS signoz_metrics.samples_v2 ENGINE = Distributed(\"cluster\", \"signoz_metrics\", samples_v2, cityHash64(metric_name, fingerprint));\n" component=clickhouse 2023/03/19 025614 Error creating clickhouse client: code: 60, message: Table signoz_metrics.samples_v2 doesn't exist in the clickhouse db only metrics database is created
s
It says migration failed. Share the full logs of the signoz-otel-collector
v
That is the full logs of both metrics and collector @Srikanth Chekuri
s
What’s the exact output of
SELECT version()
for ClickHouse. Also when you are using external ClickHouse there should be cluster called
cluster
.
v
Yes the cluster name is cluster I'll share tbe output as well
@Srikanth Chekuri below is the output for select version()
@Srikanth Chekuri entire logs of otel collector PS D:\Signoz> kubectl logs po/signoz-otel-collector-86f4667d85-mbkmn -n signoz -f Defaulted container "signoz-otel-collector" out of: signoz-otel-collector, signoz-otel-collector-init (init) 2023-03-20T011245.019Z info service/telemetry.go:111 Setting up own telemetry... 2023-03-20T011245.019Z info service/telemetry.go:141 Serving Prometheus metrics {"address": "0.0.0.0:8888", "level": "Basic"} 2023-03-20T011245.019Z info components/components.go:30 Stability level of component is undefined {"kind": "exporter", "data_type": "metrics", "name": "clickhousemetricswrite", "stability": "Undefined"} time="2023-03-20T011245Z" level=info msg="Executing:\nCREATE DATABASE IF NOT EXISTS signoz_metrics ON CLUSTER cluster\n" component=clickhouse time="2023-03-20T011245Z" level=info msg="Executing:\nCREATE TABLE IF NOT EXISTS signoz_metrics.samples_v2 ON CLUSTER cluster (\n\t\t\tmetric_name LowCardinality(String),\n\t\t\tfingerprint UInt64 Codec(DoubleDelta, LZ4),\n\t\t\ttimestamp_ms Int64 Codec(DoubleDelta, LZ4),\n\t\t\tvalue Float64 Codec(Gorilla, LZ4)\n\t\t)\n\t\tENGINE = MergeTree\n\t\t\tPARTITION BY toDate(timestamp_ms / 1000)\n\t\t\tORDER BY (metric_name, fingerprint, timestamp_ms)\n\t\t\tTTL toDateTime(timestamp_ms/1000) + INTERVAL 2592000 SECOND DELETE;\n" component=clickhouse time="2023-03-20T011245Z" level=info msg="Executing:\nCREATE TABLE IF NOT EXISTS signoz_metrics.distributed_samples_v2 ON CLUSTER cluster AS signoz_metrics.samples_v2 ENGINE = Distributed(\"cluster\", \"signoz_metrics\", samples_v2, cityHash64(metric_name, fingerprint));\n" component=clickhouse 2023/03/20 011245 Error creating clickhouse client: code: 60, message: Table signoz_metrics.samples_v2 doesn't exist
s
It says
signoz_metrics.samples_v2
doesn’t exist, but the line above says otherwise, and I can’t reproduce this locally. We don’t officially support Windows yet if you are running using Windows.
v
No doing it in kubernetes cluster
@Srikanth Chekuri after further analysis it is observerd that it is unable to execute few queries when i'm entering the queries manually it is able to execute in query logs it is printing with \n charecter after removing that and if i execute it is working
we need assistance on this at the earliest
s
You are seeing the
\n
because it is a log message. The actual query runs fine. It will look like this.
Copy code
CREATE TABLE IF NOT EXISTS signoz_metrics.samples_v2 ON CLUSTER cluster (
			metric_name LowCardinality(String),
			fingerprint UInt64 Codec(DoubleDelta, LZ4),
			timestamp_ms Int64 Codec(DoubleDelta, LZ4),
			value Float64 Codec(Gorilla, LZ4)
		)
		ENGINE = MergeTree
			PARTITION BY toDate(timestamp_ms / 1000)
			ORDER BY (metric_name, fingerprint, timestamp_ms)
			TTL toDateTime(timestamp_ms/1000) + INTERVAL 2592000 SECOND DELETE;
Which is a valid ClickHouse query. You can see the code here https://github.com/SigNoz/signoz-otel-collector/blob/dcf11134a7db5d3f3e55dd14d34d1b1decb77e65/exporter/clickhousemetricsexporter/clickhouse.go#L[…]6
v
then why few queries are getting executed and few aren't
s
That’s not clear because your ClickHouse is blackbox to me as I have no idea how it is set up; the queries are run in order; it runs the
Copy code
time="2023-03-20T01:12:45Z" level=info msg="Executing:\nCREATE TABLE IF NOT EXISTS signoz_metrics.samples_v2 ON CLUSTER cluster (\n\t\t\tmetric_name LowCardinality(String),\n\t\t\tfingerprint UInt64 Codec(DoubleDelta, LZ4),\n\t\t\ttimestamp_ms Int64 Codec(DoubleDelta, LZ4),\n\t\t\tvalue Float64 Codec(Gorilla, LZ4)\n\t\t)\n\t\tENGINE = MergeTree\n\t\t\tPARTITION BY toDate(timestamp_ms / 1000)\n\t\t\tORDER BY (metric_name, fingerprint, timestamp_ms)\n\t\t\tTTL toDateTime(timestamp_ms/1000) + INTERVAL 2592000 SECOND DELETE;\n" component=clickhouse
And proceeds to the next one and throw error
signoz_metrics.samples_v2
doesn’t exist. The prev step didn’t result in an error. Does your clickhouse set up to support the distributed engines and the cluster named
cluster
exist?
v
Thank you @Srikanth Chekuri for your time. indeed clickhouse version was the problem i deleted entirely everything and recreated everything it started working as expected.
Hello @Srikanth Chekuri Today i did reinstallations of signoz Databases got created but tables are not getting created and logs are struck here and getting crashlooped
Please help
What are supoorted zookeper versions
Hello @Srikanth Chekuri / All when we are shutting down and starting back the k8s cluster below is the situation when checking clickhouse logs getting below error 2023.04.05 064351.572343 [ 7 ] {} <Error> Application: DB:Exception Suspiciously many (13 parts, 0.00 B in total) broken parts to remove while maximum allowed broken parts count is 10. You can change the maximum value with merge tree setting 'max_suspicious_broken_parts' in <merge_tree> configuration section or in table settings in .sql file (don't forget to return setting back to default value): Cannot attach table
signoz_metrics
.
samples_v2
from metadata file /var/lib/clickhouse/store/90f/90fde3c5-a131-4fd9-a85a-51a5cab5517e/samples_v2.sql from query ATTACH TABLE signoz_metrics.samples_v2 UUID '01c3f5a7-9d8b-45dc-9903-c11b8ae3bdd2' (
metric_name
LowCardinality(String),
fingerprint
UInt64 CODEC(DoubleDelta, LZ4),
timestamp_ms
Int64 CODEC(DoubleDelta, LZ4),
value
Float64 CODEC(Gorilla, LZ4)) ENGINE = MergeTree PARTITION BY toDate(timestamp_ms / 1000) ORDER BY (metric_name, fingerprint, timestamp_ms) TTL toDateTime(timestamp_ms / 1000) + toIntervalSecond(2592000) SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1 when looked for this error i could find fixes in VM setup but how to fix in kubernetes please suggest at priority as we are conducting load tests.
s
Did you force shutdown it?
v
normal kubernetes cluster shutdown @Srikanth Chekuri
s
What was the storage type & class you were using?
v
azure default storage class, type PVC @Srikanth Chekuri
can you please let know if there is any fix for this?
s
This could be solved with the init container in k8s, but our charts do not yet support the custom run command. Is the previous data important for you? The immediate fix is to get rid of old data and start fresh with little higher ``max_suspicious_broken_parts_bytes`` config. The long tem fix is support the custom commands for init container in our charts.
v
ok for now how to add the config and start the cluster? @Srikanth Chekuri also when can we expect full term resolution?
because i'm not finding where to add in the config maps
s
v
Hello @Srikanth Chekuri Thanks after adding the line it worked Please let me know the long term fix can be applied. a timeline
@Srikanth Chekuri After the setup and pushing logs it is observed that data in the UI is not persistant, sometimes the data is visible and sometimes it's not please help E0417 032152.994250 1 connection.go:105] connect():FAILED Ping(http://***:***@chi-signoz-clickhouse-cluster-0-0.signoz.svc.cluster.local:8123/). Err: dial tcp 10.33.0.198123 connect: connection refused E0417 032152.994306 1 connection.go:192] Exec():FAILED connect(http://***:***@chi-signoz-clickhouse-cluster-0-0.signoz.svc.cluster.local:8123/) for SQL: SYSTEM DROP DNS CACHE W0417 032152.994314 1 retry.go:52] exec()chi signoz clickhouse cluster 0 0.signoz.svc.cluster.localFAILED single try. No retries will be made for Applying sqls I0417 032153.007646 1 worker.go:299] signoz/signoz-clickhouse/8f95e18c-9af2-4d4f-a3cd-55794c2d22f6:IPs of the CHI [10.33.0.19]
That is the error from clickhouse operator
s
What are the resources are given to ClickHouse? Is the CilckHouse cluster running when you saw these errors?
v
Yes @Srikanth Chekuri clickhouse was running, 2 core and 2 gB of ram
s
What is the amount of data you are inserting i.e ingestion rate?
And what is the rough volume of data you are querying.
v
for now we are not querying anything data ingestion also let's say with 1k tps or so
@Srikanth Chekuri
s
The minimum recommended hardware for ClickHouse is 4 core and 16 CPU.
v
ok will keep that and check and get back @Srikanth Chekuri
even after resource increase still i'm unable to see the complete set of data that is pushed @Srikanth Chekuri
s
You could be a little more explicit when you say you are “unable to see the complete set of data that is pushed”. What data are you pushing? What is the volume? What is missing? What are you trying to do?
v
We are using go module of otel collector and directly writing logs to otel collector there are 30 applications where logs are being written to signoz otel collector but i'm sometimes able to see 3 components or sometimes 4 and it varies This is frontend UI i'm talking about. @Srikanth Chekuri
s
there are 30 applications where logs are being written to signoz otel collector
What is the volume of the logs emitted by these applications?
but i’m sometimes able to see 3 components or sometimes 4 and it varies
What component are you referring here?
v
1. 3000 logs data per second 2. the applications which we are using to push logs
@Srikanth Chekuri
image.png
s
The
Services
tab data comes from the traces. If services it not sending data for the period it will not show up in the list. The items on the list are complete set but the services that sent the data in the selected time range (top-right corner).
v
ok understood, we assumed that the time interval was on how much time the data should be refreshed hence was getting confused thank you.