This message was deleted.
# support
s
This message was deleted.
s
Which version of SigNoz deployment is this? Do you see any error logs in query-service?
v
Dont see any error in query service
s
How long does it keep spinning? How many services do you have? Would you be able to exec into ClickHouse and run some query?
v
Yeah I can login to clickhouse db
It just keeps spinning
2024-02-26T132253.933Z ERROR clickhouseReader/reader.go:4609 error while reading time series result write: write tcp 10.107.86.25357200 >172.20.141.219000: i/o timeout 2024-02-26T132310.797Z ERROR clickhouseReader/reader.go:4609 error while reading time series result write: write tcp 10.107.86.25357260 >172.20.141.219000: i/o timeout 2024-02-26T132310.801Z ERROR clickhouseReader/reader.go:4609 error while reading time series result write: write tcp 10.107.86.25360758 >172.20.141.219000: i/o timeout 2024-02-26T132310.801Z INFO utils/time.go:12 func GetTimeSeriesResultV3 took 1m0.002184427s with args [SELECT A.
address
as
address
, A.
ts
as
ts
, A.value * 100 / B.value as value FROM (SELECT address, ts, sum(rate_value) as value FROM (SELECT address, ts, If((value - lagInFrame(value, 1, 0) OVER rate_window) < 0, nan, If((ts - lagInFrame(ts, 1, toDate('1970-01-01')) OVER rate_window) >= 86400, nan, (value - lagInFrame(value, 1, 0) OVER rate_window) / (ts - lagInFrame(ts, 1, toDate('1970-01-01')) OVER rate_window))) as rate_value FROM(SELECT fingerprint, address, toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), INTERVAL 60 SECOND) as ts, max(value) as value FROM signoz_metrics.distributed_samples_v2 INNER JOIN (SELECT JSONExtractString(labels, 'address') as address, fingerprint FROM signoz_metrics.time_series_v2 WHERE metric_name = 'signoz_external_call_latency_count' AND temporality IN ['Cumulative', 'Unspecified'] AND JSONExtractString(labels, 'service_name') IN ['app-settlement'] AND JSONExtractString(labels, 'status_code') IN ['STATUS_CODE_ERROR']) as filtered_time_series USING fingerprint WHERE metric_name = 'signoz_external_call_latency_count' AND timestamp_ms >= 1708951860000 AND timestamp_ms < 1708953720000 GROUP BY fingerprint, address,ts ORDER BY fingerprint, address ASC, ts) WINDOW rate_window as (PARTITION BY fingerprint, address ORDER BY fingerprint, address ASC, ts) ) WHERE isNaN(rate_value) = 0 GROUP BY GROUPING SETS ( (address, ts), (address) ) ORDER BY address ASC, ts) as A INNER JOIN (SELECT address, ts, sum(rate_value) as value FROM (SELECT address, ts, If((value - lagInFrame(value, 1, 0) OVER rate_window) < 0, nan, If((ts - lagInFrame(ts, 1, toDate('1970-01-01')) OVER rate_window) >= 86400, nan, (value - lagInFrame(value, 1, 0) OVER rate_window) / (ts - lagInFrame(ts, 1, toDate('1970-01-01')) OVER rate_window))) as rate_value FROM(SELECT fingerprint, address, toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), INTERVAL 60 SECOND) as ts, max(value) as value FROM signoz_metrics.distributed_samples_v2 INNER JOIN (SELECT JSONExtractString(labels, 'address') as address, fingerprint FROM signoz_metrics.time_series_v2 WHERE metric_name = 'signoz_external_call_latency_count' AND temporality IN ['Cumulative', 'Unspecified'] AND JSONExtractString(labels, 'service_name') IN ['app-settlement']) as filtered_time_series USING fingerprint WHERE metric_name = 'signoz_external_call_latency_count' AND timestamp_ms >= 1708951860000 AND timestamp_ms < 1708953720000 GROUP BY fingerprint, address,ts ORDER BY fingerprint, address ASC, ts) WINDOW rate_window as (PARTITION BY fingerprint, address ORDER BY fingerprint, address ASC, ts) ) WHERE isNaN(rate_value) = 0 GROUP BY GROUPING SETS ( (address, ts), (address) ) ORDER BY address ASC, ts) as B ON A.
address
= B.
address
AND A.
ts
= B.
ts
] countIf(statusCode=2) as errorCount, 2024-02-26T133335.137Z INFO utils/time.go:12 func GetTimeSeriesResultV3 took 418.083515ms with args [SELECT B.
ts
as
ts
, ((B.value + C.value) / 2) / A.value as value FROM (SELECT ts, sum(rate_value) as value FROM (SELECT ts, If((value - lagInFrame(value, 1, 0) OVER rate_window) < 0, nan, If((ts - lagInFrame(ts, 1, toDate('1970-01-01')) OVER rate_window) >= 86400, nan, (value - lagInFrame(value, 1, 0) OVER rate_window) / (ts - lagInFrame(ts, 1, toDate('1970-01-01')) OVER rate_window))) as rate_value FROM(SELECT fingerprint, toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), INTERVAL 60 SECOND) as ts, max(value) as value FROM signoz_metrics.distributed_samples_v2 INNER JOIN (SELECT fingerprint FROM signoz_metrics.time_series_v2 WHERE metric_name = 'signoz_latency_bucket' AND temporality IN ['Cumulative', 'Unspecified'] AND JSONExtractString(labels, 'status_code') != 'STATUS_CODE_ERROR' AND JSONExtractString(labels, 'le') = '1000' AND JSONExtractString(labels, 'service_name') = 'core-cam-http' AND JSONExtractString(labels, 'operation') IN ['HTTP POST route not found']) as filtered_time_series USING fingerprint WHERE metric_name = 'signoz_latency_bucket' AND timestamp_ms >= 1708951860000 AND timestamp_ms < 1708953720000 GROUP BY fingerprint, ts ORDER BY fingerprint, ts) WINDOW rate_window as (PARTITION BY fingerprint ORDER BY fingerprint, ts) ) WHERE isNaN(rate_value) = 0 GROUP BY ts ORDER BY ts) as B INNER JOIN (SELECT ts, sum(rate_value) as value FROM (SELECT ts, If((value - lagInFrame(value, 1, 0) OVER rate_window) < 0, nan, If((ts - lagInFrame(ts, 1, toDate('1970-01-01')) OVER rate_window) >= 86400, nan, (value - lagInFrame(value, 1, 0) OVER rate_window) / (ts - lagInFrame(ts, 1, toDate('1970-01-01')) OVER rate_window))) as rate_value FROM(SELECT fingerprint, toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), INTERVAL 60 SECOND) as ts, max(value) as value FROM signoz_metrics.distributed_samples_v2 INNER JOIN (SELECT fingerprint FROM signoz_metrics.time_series_v2 WHERE metric_name = 'signoz_latency_bucket' AND temporality IN ['Cumulative', 'Unspecified'] AND JSONExtractString(labels, 'le') = '10000' AND JSONExtractString(labels, 'status_code') != 'STATUS_CODE_ERROR' AND JSONExtractString(labels, 'service_name') = 'core-cam-http' AND JSONExtractString(labels, 'operation') IN ['HTTP POST route not found']) as filtered_time_series USING fingerprint WHERE metric_name = 'signoz_latency_bucket' AND timestamp_ms >= 1708951860000 AND timestamp_ms < 1708953720000 GROUP BY fingerprint, ts ORDER BY fingerprint, ts) WINDOW rate_window as (PARTITION BY fingerprint ORDER BY fingerprint, ts) ) WHERE isNaN(rate_value) = 0 GROUP BY ts ORDER BY ts) as C ON B.
ts
= C.
ts
INNER JOIN (SELECT ts, sum(rate_value) as value FROM (SELECT ts, If((value - lagInFrame(value, 1, 0) OVER rate_window) < 0, nan, If((ts - lagInFrame(ts, 1, toDate('1970-01-01')) OVER rate_window) >= 86400, nan, (value - lagInFrame(value, 1, 0) OVER rate_window) / (ts - lagInFrame(ts, 1, toDate('1970-01-01')) OVER rate_window))) as rate_value FROM(SELECT fingerprint, toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), INTERVAL 60 SECOND) as ts, max(value) as value FROM signoz_metrics.distributed_samples_v2 INNER JOIN (SELECT fingerprint FROM signoz_metrics.time_series_v2 WHERE metric_name = 'signoz_latency_count' AND temporality IN ['Cumulative', 'Unspecified'] AND JSONExtractString(labels, 'service_name') = 'core-cam-http' AND JSONExtractString(labels, 'operation') IN ['HTTP POST route not found']) as filtered_time_series USING fingerprint WHERE metric_name = 'signoz_latency_count' AND timestamp_ms >= 1708951860000 AND timestamp_ms < 1708953720000 GROUP BY fingerprint, ts ORDER BY fingerprint, ts) WINDOW rate_window as (PARTITION BY fingerprint ORDER BY fingerprint, ts) ) WHERE isNaN(rate_value) = 0 GROUP BY ts ORDER BY ts) as A ON C.
ts
= A.
ts
] 2024-02-27T002048.114Z INFO clickhouseReader/reader.go:2464 SELECT id, status, ttl, cold_storage_ttl FROM ttl_status WHERE table_name = ? ORDER BY created_at DESCsignoz_traces.signoz_error_index_v2 2024-02-27T002048.329Z DEBUG clickhouseReader/reader.go:2558 Parsing TTL from: MergeTree PARTITION BY toDate(timestamp) PRIMARY KEY (serviceName, hasError, toStartOfHour(timestamp), name) ORDER BY (serviceName, hasError, toStartOfHour(timestamp), name, timestamp) TTL toDateTime(timestamp) + toIntervalSecond(1296000) SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1 2024-02-27T030618.905Z INFO clickhouseReader/reader.go:1221 SELECT COUNT(*) as numTotal FROM signoz_traces.distributed_signoz_index_v2 WHERE timestamp >= @timestampL AND timestamp <= @timestampU AND hasError = true 2024-02-27T030618.937Z INFO clickhouseReader/reader.go:1232 SELECT COUNT(*) as numTotal FROM signoz_traces.distributed_signoz_index_v2 WHERE timestamp >= @timestampL AND timestamp <= @timestampU AND hasError = false 2024-02-27T030806.918Z INFO clickhouseReader/reader.go:1221 SELECT COUNT(*) as numTotal FROM signoz_traces.distributed_signoz_index_v2 WHERE timestamp >= @timestampL AND timestamp <= @timestampU AND hasError = true 2024-02-27T030806.930Z INFO clickhouseReader/reader.go:1232 SELECT COUNT(*) as numTotal FROM signoz_traces.distributed_signoz_index_v2 WHERE timestamp >= @timestampL AND timestamp <= @timestampU AND hasError = false 2024-02-27T032703.062Z ERROR clickhouseReader/reader.go:853 Error in processing sql query: write: write tcp 10.107.86.25348690 &gt;172.20.141.219000: i/o timeout 2024-02-27T034113.609Z ERROR clickhouseReader/reader.go:853 Error in processing sql query: write: write tcp 10.107.86.25341776 &gt;172.20.141.219000: i/o timeout 2024-02-27T035440.873Z ERROR clickhouseReader/reader.go:853 Error in processing sql query: write: write tcp 10.107.86.25358618 &gt;172.20.141.219000: i/o timeout posthog 2024/02/27 035745 ERROR: sending request - Post "https://app.posthog.com/batch/": read tcp 10.107.86.25332882 &gt;104.22.58.181443: read: connection reset by peer 2024-02-27T044121.721Z ERROR clickhouseReader/reader.go:853 Error in processing sql query: write: write tcp 10.107.86.25359732 &gt;172.20.141.219000: i/o timeout
See some error in query service
s
It says i/o timeout. What are the resources given to ClickHouse?
v
Default values but it can go up to 5-6 cores
s
How much data are you ingesting? And what is the current CPU and memory usage of clickhouse pods?
v
Ist consuming arround 700 cores as of now
Mem is arround 6 gb its using
how do we dsiable s3 once its deployed?
Pods go in to crash if I disable s3
Last optin is to uninistall and reinstall
Deleting clickhosue tables and running helm fixes issies for some time and again the services goes in to spinning mode
s
That's not a way to solve the issue. We first need to understand what's the reason behind this. How much data are you ingesting?
Share the output of this
Copy code
SELECT
    serviceName,
    count()
FROM signoz_traces.distributed_top_level_operations
GROUP BY serviceName
What is your table TTL policies?
Copy code
SHOW CREATE TABLE signoz_traces.signoz_index_v2
v
SHOW CREATE TABLE signoz_traces.signoz_index_v2 Query id: 86614b46-690a-4e88-b8ba-827046d9a858 ┌─statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ CREATE TABLE signoz_traces.signoz_index_v2 (
timestamp
DateTime64(9) CODEC(DoubleDelta, LZ4),
traceID
FixedString(32) CODEC(ZSTD(1)),
spanID
String CODEC(ZSTD(1)),
parentSpanID
String CODEC(ZSTD(1)),
serviceName
LowCardinality(String) CODEC(ZSTD(1)),
name
LowCardinality(String) CODEC(ZSTD(1)),
kind
Int8 CODEC(T64, ZSTD(1)),
durationNano
UInt64 CODEC(T64, ZSTD(1)),
statusCode
Int16 CODEC(T64, ZSTD(1)),
externalHttpMethod
LowCardinality(String) CODEC(ZSTD(1)),
externalHttpUrl
LowCardinality(String) CODEC(ZSTD(1)),
component
LowCardinality(String) CODEC(ZSTD(1)),
dbSystem
LowCardinality(String) CODEC(ZSTD(1)),
dbName
LowCardinality(String) CODEC(ZSTD(1)),
dbOperation
LowCardinality(String) CODEC(ZSTD(1)),
peerService
LowCardinality(String) CODEC(ZSTD(1)),
events
Array(String) CODEC(ZSTD(2)),
httpMethod
LowCardinality(String) CODEC(ZSTD(1)),
httpUrl
LowCardinality(String) CODEC(ZSTD(1)),
httpCode
LowCardinality(String) CODEC(ZSTD(1)),
httpRoute
LowCardinality(String) CODEC(ZSTD(1)),
httpHost
LowCardinality(String) CODEC(ZSTD(1)),
msgSystem
LowCardinality(String) CODEC(ZSTD(1)),
msgOperation
LowCardinality(String) CODEC(ZSTD(1)),
hasError
Bool CODEC(T64, ZSTD(1)),
tagMap
Map(LowCardinality(String), String) CODEC(ZSTD(1)),
gRPCMethod
LowCardinality(String) CODEC(ZSTD(1)),
gRPCCode
LowCardinality(String) CODEC(ZSTD(1)),
rpcSystem
LowCardinality(String) CODEC(ZSTD(1)),
rpcService
LowCardinality(String) CODEC(ZSTD(1)),
rpcMethod
LowCardinality(String) CODEC(ZSTD(1)),
responseStatusCode
LowCardinality(String) CODEC(ZSTD(1)),
stringTagMap
Map(String, String) CODEC(ZSTD(1)),
numberTagMap
Map(String, Float64) CODEC(ZSTD(1)),
boolTagMap
Map(String, Bool) CODEC(ZSTD(1)),
resourceTagsMap
Map(LowCardinality(String), String) CODEC(ZSTD(1)), INDEX idx_service serviceName TYPE bloom_filter GRANULARITY 4, INDEX idx_name name TYPE bloom_filter GRANULARITY 4, INDEX idx_kind kind TYPE minmax GRANULARITY 4, INDEX idx_duration durationNano TYPE minmax GRANULARITY 1, INDEX idx_httpCode httpCode TYPE set(0) GRANULARITY 1, INDEX idx_hasError hasError TYPE set(2) GRANULARITY 1, INDEX idx_tagMapKeys mapKeys(tagMap) TYPE bloom_filter(0.01) GRANULARITY 64, INDEX idx_tagMapValues mapValues(tagMap) TYPE bloom_filter(0.01) GRANULARITY 64, INDEX idx_httpRoute httpRoute TYPE bloom_filter GRANULARITY 4, INDEX idx_httpUrl httpUrl TYPE bloom_filter GRANULARITY 4, INDEX idx_httpHost httpHost TYPE bloom_filter GRANULARITY 4, INDEX idx_httpMethod httpMethod TYPE bloom_filter GRANULARITY 4, INDEX idx_timestamp timestamp TYPE minmax GRANULARITY 1, INDEX idx_rpcMethod rpcMethod TYPE bloom_filter GRANULARITY 4, INDEX idx_responseStatusCode responseStatusCode TYPE set(0) GRANULARITY 1, INDEX idx_resourceTagsMapKeys mapKeys(resourceTagsMap) TYPE bloom_filter(0.01) GRANULARITY 64, INDEX idx_resourceTagsMapValues mapValues(resourceTagsMap) TYPE bloom_filter(0.01) GRANULARITY 64, PROJECTION timestampSort ( SELECT * ORDER BY timestamp ) ) ENGINE = MergeTree PARTITION BY toDate(timestamp) PRIMARY KEY (serviceName, hasError, toStartOfHour(timestamp), name) ORDER BY (serviceName, hasError, toStartOfHour(timestamp), name, timestamp) TTL toDateTime(timestamp) + toIntervalSecond(1296000) SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1 │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ 1 row in set. Elapsed: 0.003 sec. chi-signoz-release-clickhouse-cluster-0-0-0.chi-signoz-release-clickhouse-cluster-0-0.platform.svc.cluster.local :)
s
Run this and see how long does it take to complete the query
Copy code
SELECT
    serviceName,
    toStartOfInterval(timestamp, toIntervalSecond(60)) AS ts,
    quantile(0.5)(durationNano) AS value
FROM signoz_traces.distributed_signoz_index_v2
WHERE ((timestamp >= '1709013204000000000') AND (timestamp <= '1709015036000000000'))
GROUP BY serviceName, ts
ORDER BY serviceName, ts
v
558 rows in set. Elapsed: 0.025 sec. Processed 59.00 thousand rows, 1.00 MB (2.33 million rows/s., 39.63 MB/s.) Peak memory usage: 4.12 MiB.
s
Did you purge clickhouse in the last 1-2 hours?
v
May be yesterday
Trying all steps have purged couple of times
Once purged it works for some time and then stops
s
I don't see anything wrong based on the outputs you shared.
When this happens there might be some frontend error which makes the spinner no go away. Please check if that's the case
v
restarted forntend pods, no luck
s
No, I meant the javascript error. Do you see the spinner now?
v
yeah spinner stll ther e
different browsers is the same
s
Would you be able to join huddle now?
v
sure
s
Open the network tab and share which requests are these
The queries were running quick enough to not timeout. What does your setup look like? Did you make any custom changes to SigNoz deployment? I am also waiting on huddle if you would perfer that way.
v
Not sure how huddle works, do you have to send an invite?
s
I did send an invite. You joined for a moment and then left.
Copy code
SHOW CREATE TABLE signoz_traces.top_level_operations