Slackbot
04/18/2023, 10:21 PMSrikanth Chekuri
04/19/2023, 1:07 AM0.18.1
. We have the client set up with the default number of connections, around 10 or 15. If you have long-running queries that don’t complete in reasonable other requests may timeout. We could make this number of connections configurable, but that won’t solve the issue entirely since, eventually, ClickHouse will throw a TOO_MANY_SIMULTANEOUS_QUERIES
error. Can you help us understand your queries and the time range and amount of data you are querying?Al
04/19/2023, 7:22 PMSELECT quantile(0.99)(durationNano) as p99, avg(durationNano) as avgDuration, count(*) as numCalls FROM signoz_traces.distributed_signoz_index_v2 WHERE serviceName = 'blah' AND name In ['Elasticsearch DELETE', 'Elasticsearch HEAD', 'Elasticsearch POST', 'Elasticsearch POST
<-- this list has 950 additional entries and fails with Max query size exceeded:
Where is this query invoked from?Al
04/19/2023, 9:30 PMSELECT
fingerprint,
max(value) AS value,
toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), INTERVAL 60 SECOND) as ts,
http_url,
http_status_code
FROM
signoz_metrics.distributed_samples_v2 GLOBAL
INNER JOIN (
SELECT
JSONExtractString(distributed_time_series_v2.labels, 'http_url') as http_url,
JSONExtractString(distributed_time_series_v2.labels, 'http_status_code') as http_status_code,
fingerprint
FROM
signoz_metrics.distributed_time_series_v2
WHERE
metric_name = 'httpcheck_status'
) as filtered_time_series USING fingerprint
WHERE
metric_name = 'httpcheck_status'
AND toDateTime(intDiv(timestamp_ms, 1000)) BETWEEN {{.start_datetime}} AND {{.end_datetime}}
GROUP BY
http_url,
http_status_code,
fingerprint,
ts
ORDER BY
http_url,
http_status_code,
fingerprint,
ts
• 1 panel has the following:
SELECT
toStartOfInterval(timestamp, toIntervalMinute(1)) AS interval,
peerService AS peer_service,
serviceName,
httpCode,
toFloat64(count()) AS value
FROM signoz_traces.distributed_signoz_index_v2
WHERE stringTagMap['k8s.namespace.name'] = {{.namespace}}
AND (peer_service != '')
AND (httpCode != '')
AND (httpCode NOT LIKE '2%%')
AND timestamp BETWEEN {{.start_datetime}} AND {{.end_datetime}}
GROUP BY (peerService, serviceName, httpCode, interval)
ORDER BY (httpCode, interval) ASC
Srikanth Chekuri
04/20/2023, 1:11 AM1. The problem occurs consistently, if I extend the date range to 1 week which surprises me with retention settings of Metrics: 7 days, Traces: 1 days, Logs: 1 days.what are the memory resources given to ClickHouse? Loading 1 week data and ordering means it requires a much memory.
Al
04/20/2023, 1:47 PMmetadata.name: chi-signoz-clickhouse-cluster-0-0
resources.requests.cpu: '1'
resources.requests.memory: 6000Mi
Here is a weeks worth of memory, cpu usage for clickhouse.Srikanth Chekuri
04/21/2023, 8:15 AMAl
04/22/2023, 1:44 PMAl
04/22/2023, 11:40 PMAl
04/25/2023, 5:24 PMSrikanth Chekuri
04/25/2023, 5:26 PMAl
04/25/2023, 5:32 PMAl
04/28/2023, 10:29 PMNAME CPU(cores) MEMORY(bytes)
chi-signoz-clickhouse-cluster-0-0-0 1177m 5825Mi
signoz-otel-collector-b87bf5d54-qpsx9 2878m 966Mi
signoz-otel-collector-metrics-7bdb76c7fd-fjs6g 842m 1320Mi
signoz-alertmanager-0 2m 23Mi
signoz-clickhouse-operator-6dd75c99f8-wz4sf 2m 52Mi
signoz-frontend-595d64465b-qf777 1m 11Mi
signoz-k8s-infra-otel-agent-dr4sl 42m 126Mi
signoz-k8s-infra-otel-deployment-7d4857ff7c-h2q6n 2m 66Mi
signoz-query-service-0 10m 145Mi
signoz-zookeeper-0 5m 390Mi
Hi @Srikanth Chekuri I have all of the above pods running on a single node. I tried adding a second node but ran into trouble with connections being refused, between pods running on different nodes.
What can I safely divide onto separate nodes and still function, in order to improve the performance of the signoz UI.Al
05/03/2023, 8:54 PMAl
05/23/2023, 2:33 PMSrikanth Chekuri
05/23/2023, 2:34 PMSrikanth Chekuri
05/24/2023, 2:53 AMAl
05/25/2023, 9:28 PMSrikanth Chekuri
05/26/2023, 9:09 AM