Hi, i setup signoz for k8s monitoring via otel, i...
# general
s
Hi, i setup signoz for k8s monitoring via otel, i want to connect directly to clickhouse so i can create custom queries. i checked the
signoz_metrics
db
distributed_time_series_v4
(assuming this is what i should use for getting metrics). where is metric value stored ?
image.png
s
It exists in
samples_v4
table with column name
value
s
is there a reason why the table doesn't contain labels ?
s
That would be redundant labels for each measurement and make the storage usage unnecessarily high.
s
so would joining on fingerprint would give correct result ? or is unix_milli required ?
s
Joining on fingerprint will give the correct result. The unix_milli on
samples
table indicates when measurement is produced.
s
awesome thank you
oh one last thing, for logs is this good enough ?
Copy code
select timestamp, body
from distributed_logs
where
    arrayElement (
        resources_string_value, indexOf (
            resources_string_key, 'k8s.container.name'
        )
    ) like 'myapp%'
order by timestamp desc
limit 5;
s
yes
s
for metrics, here is my query
Copy code
SELECT ts.fingerprint, ts.metric_name, samples.unix_milli, samples.value
FROM
    distributed_time_series_v4 ts
    JOIN distributed_samples_v4 samples ON ts.fingerprint = samples.fingerprint
WHERE
    ts.metric_name = 'container_cpu_utilization'
    AND JSONExtractString (
        ts.labels, 'k8s_container_name'
    ) LIKE 'myapp%'
ORDER BY samples.unix_milli DESC
LIMIT 10;
i thought the samples.unix_milli would have a difference of scrape_interval, but something seems wrong
image.png
s
The
time_series_v4
can contain duplicates. You should take care of that in the query. Why are you writing your own queries? Does the query builder not support what you are trying to achieve?
s
i need to do some custom logic, and use like api
so i can directly use clickhouse api
distributed_time_series_v4_1day
it would contain metrics for present day ?
s
Yes, the
distributed_time_series_v4
attempts to have one row for each unique time series, ``distributed_time_series_v4_6hrs`` one row for 6hrs and ``distributed_time_series_v4_1day`` one row for a day. So that when we query the large duration we can reduce the amount of data read which means faster queries.