Hello team, I'm using Signoz v0.11.4.. Currently f...
# support
v
Hello team, I'm using Signoz v0.11.4.. Currently from our exporter, we've only enabled logs and and have set the retention period for logs of 4 hours. Still I can see there's never a clean up in the clickhouse storage. Why is so? Any help on this is appreciated. Thansk
p
@Vaibhav Sharma There are known issues in setting retention period in earlier versions- https://github.com/SigNoz/signoz/issues/1982 We have made improvements on this in the latest release 0.15.0 - suggest you to upgrade to the same Also, there are some best practices for retention setting which we recommend to follow - https://signoz.io/docs/userguide/retention-period/#recommendations-for-setting-retention-period
v
ohh got it.. is it stable enough @Pranay? we've already done few rollback because of this 😅
p
rollbacks because of upgrading to newer versions or because of issues in setting retention period?
v
didn't find any issues in the collector side though... just in the signoz frontend.. sometimes login.. adding member etc
anyway I'll try this newer version
v
@Vaibhav Sharma What’s the oldest log that you see? It is expected that there’s a delay of few hour in deleting expired logs
v
I think now it has chocked the the PV.. so no memory left and no new log since then.. I'll get back to you guys with this new version try and storage cleanup
Hey team, I was able to try the newer version but it's using the same retention period as was updated in the previous version. Any way to fix this?
it's still showing
Copy code
Your last call to change retention period to 3 hr is pending. This may take some time.
v
Looks like you upgraded while retention was being set. You can connect to SQLite DB and clear TTL status table. If you are using docker follow below steps: Connect to query-service
Copy code
docker exec -it query-service sh
Run the following:
Copy code
# install sqlite
apk update
apk add sqlite

# open sqlite with signoz.db
sqlite3 /var/lib/signoz/signoz.db

# (sqlite shell) check existing ttl status
select * from ttl_status;

# delete all rows of ttl_status
DELETE FROM ttl_status;
v
yeah recreating the query service pv fixed it
thanks @Vishal Sharma
Hey Team, on my clickhouse cluster inside
/var/lib/clickhouse/data
I can see
Copy code
4.0K	signoz_logs/distributed_logs/
4.0K	signoz_logs/distributed_logs_atrribute_keys/
4.0K	signoz_logs/distributed_logs_resource_keys/
4.0K	signoz_logs/distributed_usage/
5.6G	signoz_logs/logs/
56.5M	signoz_logs/logs_atrribute_keys/
56.6M	signoz_logs/logs_resource_keys/
44.0K	signoz_logs/schema_migrations/
52.0K	signoz_logs/usage/
4.0K	signoz_metrics/distributed_samples_v2/
4.0K	signoz_metrics/distributed_time_series_v2/
4.0K	signoz_metrics/distributed_usage/
586.7M	signoz_metrics/samples_v2/
1.1M	signoz_metrics/time_series_v2/
136.0K	signoz_metrics/usage/
12.0K	signoz_traces/dependency_graph_minutes/
4.0K	signoz_traces/distributed_dependency_graph_minutes/
4.0K	signoz_traces/distributed_durationSort/
4.0K	signoz_traces/distributed_signoz_error_index_v2/
4.0K	signoz_traces/distributed_signoz_index_v2/
4.0K	signoz_traces/distributed_signoz_spans/
4.0K	signoz_traces/distributed_top_level_operations/
4.0K	signoz_traces/distributed_usage/
4.0K	signoz_traces/distributed_usage_explorer/
12.0K	signoz_traces/durationSort/
44.0K	signoz_traces/schema_migrations/
12.0K	signoz_traces/signoz_error_index/
12.0K	signoz_traces/signoz_error_index_v2/
12.0K	signoz_traces/signoz_index/
12.0K	signoz_traces/signoz_index_v2/
12.0K	signoz_traces/signoz_spans/
12.0K	signoz_traces/top_level_operations/
12.0K	signoz_traces/usage/
12.0K	signoz_traces/usage_explorer/
but my longhorn volume shows the volume usage ~43G. Any idea where I might be consuming the resources?
I can see my logs are around
5.6G	signoz_logs/logs/
v
@Vaibhav Sharma Are you only sending logs to signoz?
v
yeah
Also I ran this query to know the table size.
Copy code
select
    parts.*,
    columns.compressed_size,
    columns.uncompressed_size,
    columns.ratio
from (
    select database,
        table,
        formatReadableSize(sum(data_uncompressed_bytes))          AS uncompressed_size,
        formatReadableSize(sum(data_compressed_bytes))            AS compressed_size,
        sum(data_compressed_bytes) / sum(data_uncompressed_bytes) AS ratio
    from system.columns
    group by database, table
) columns right join (
    select database,
           table,
           sum(rows)                                            as rows,
           max(modification_time)                               as latest_modification,
           formatReadableSize(sum(bytes))                       as disk_size,
           formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size,
           any(engine)                                          as engine,
           sum(bytes)                                           as bytes_size
    from system.parts
    where active
    group by database, table
) parts on ( columns.database = parts.database and columns.table = parts.table )
order by parts.bytes_size desc;
Got this result
Copy code
{bytes_size="68423835",compressed_size="63.23 MiB",disk_size="65.25 MiB",engine="MergeTree",parts.database="system",parts.table="part_log",primary_keys_size="840.00 B",rows="1596835",uncompressed_size="326.39 MiB"}
{bytes_size="995111",compressed_size="960.42 KiB",disk_size="971.79 KiB",engine="ReplacingMergeTree",parts.database="signoz_metrics",parts.table="time_series_v2",primary_keys_size="672.00 B",rows="40684",uncompressed_size="14.29 MiB"}
{bytes_size="354826146",compressed_size="333.96 MiB",disk_size="338.39 MiB",engine="MergeTree",parts.database="signoz_metrics",parts.table="samples_v2",primary_keys_size="204.63 KiB",rows="77402315",uncompressed_size="1.87 GiB"}
{bytes_size="176816430",compressed_size="168.21 MiB",disk_size="168.63 MiB",engine="MergeTree",parts.database="system",parts.table="trace_log",primary_keys_size="5.28 KiB",rows="7280583",uncompressed_size="2.36 GiB"}
{bytes_size="21097",compressed_size="0.00 B",disk_size="20.60 KiB",engine="MergeTree",parts.database="signoz_metrics",parts.table="usage",primary_keys_size="880.00 B",rows="143",uncompressed_size="0.00 B"}
{bytes_size="1068",compressed_size="0.00 B",disk_size="1.04 KiB",engine="ReplacingMergeTree",parts.database="signoz_logs",parts.table="logs_resource_keys",primary_keys_size="308.00 B",rows="20",uncompressed_size="0.00 B"}
{bytes_size="611",compressed_size="0.00 B",disk_size="611.00 B",engine="MergeTree",parts.database="signoz_traces",parts.table="schema_migrations",primary_keys_size="16.00 B",rows="38",uncompressed_size="0.00 B"}
{bytes_size="4902665829",compressed_size="4.52 GiB",disk_size="4.57 GiB",engine="MergeTree",parts.database="signoz_logs",parts.table="logs",primary_keys_size="542.61 KiB",rows="101928910",uncompressed_size="68.48 GiB"}
{bytes_size="34701229",compressed_size="33.01 MiB",disk_size="33.09 MiB",engine="MergeTree",parts.database="system",parts.table="query_views_log",primary_keys_size="612.00 B",rows="685032",uncompressed_size="533.24 MiB"}
{bytes_size="19513660",compressed_size="17.66 MiB",disk_size="18.61 MiB",engine="MergeTree",parts.database="system",parts.table="asynchronous_metric_log",primary_keys_size="85.83 KiB",rows="42447592",uncompressed_size="622.59 MiB"}
{bytes_size="304",compressed_size="0.00 B",disk_size="304.00 B",engine="MergeTree",parts.database="signoz_logs",parts.table="schema_migrations",primary_keys_size="16.00 B",rows="8",uncompressed_size="0.00 B"}
{bytes_size="165299941",compressed_size="156.66 MiB",disk_size="157.64 MiB",engine="MergeTree",parts.database="system",parts.table="query_log",primary_keys_size="964.00 B",rows="1253270",uncompressed_size="1.59 GiB"}
{bytes_size="19349383",compressed_size="17.57 MiB",disk_size="18.45 MiB",engine="MergeTree",parts.database="system",parts.table="metric_log",primary_keys_size="198.00 B",rows="75928",uncompressed_size="233.51 MiB"}
{bytes_size="7969",compressed_size="0.00 B",disk_size="7.78 KiB",engine="MergeTree",parts.database="signoz_logs",parts.table="usage",primary_keys_size="440.00 B",rows="49",uncompressed_size="0.00 B"}
{bytes_size="884",compressed_size="0.00 B",disk_size="884.00 B",engine="ReplacingMergeTree",parts.database="signoz_logs",parts.table="logs_atrribute_keys",primary_keys_size="260.00 B",rows="16",uncompressed_size="0.00 B"}
v
@Prashant Shahi Any insights on this?
v
Also I see the size of
/var/lib/clickhouse/store
is 11.1G
/var/lib/clickhouse/data/signoz_logs/logs/
is 9.8G is this expected?
p
v
yeah that longhorn says in documentation.. we have two replicas... so that's why it's around 40GB.. but the problem is.. for ~3-4GB of logs data on clickhouse table.. I can see the size of
/var/lib/clickhouse/data/signoz_logs/logs/
is ~10GB and another 10GB for
/var/lib/clickhouse/store
... which am not sure if this is expected..
ideally what that means for 3-4GB of my logs data.. do I have spend 20GB of my volume 😞
@Vishal Sharma @Prashant Shahi any idea?
v
@nitya-signoz Any ideas?
p
cc @Srikanth Chekuri
n
Hey Vaibhav, will get back to you on this today.
@Vaibhav Sharma can you share the output of this query
Copy code
SELECT
    database,
    table,
    formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed,
    formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed,
    round(usize / size, 2) AS compr_rate,
    sum(rows) AS rows,
    count() AS part_count
FROM system.parts
WHERE (active = 1) AND (database LIKE '%') AND (table LIKE '%')
GROUP BY
    database,
    table
ORDER BY size DESC;
v
• {compressed="1.62 GiB",database="system",part_count="50",rows="13005622",table="query_log",uncompressed="16.46 GiB"} • {compressed="171.51 MiB",database="system",part_count="8",rows="290304628",table="asynchronous_metric_log",uncompressed="4.21 GiB"} • {compressed="2.20 GiB",database="signoz_metrics",part_count="48",rows="529436516",table="samples_v2",uncompressed="12.82 GiB"} • {compressed="128.98 MiB",database="system",part_count="7",rows="519359",table="metric_log",uncompressed="1.60 GiB"} • {compressed="64.80 KiB",database="signoz_metrics",part_count="3",rows="506",table="usage",uncompressed="104.10 KiB"} • {compressed="481.00 B",database="signoz_traces",part_count="1",rows="38",table="schema_migrations",uncompressed="646.00 B"} • {compressed="214.00 B",database="signoz_logs",part_count="2",rows="8",table="logs_atrribute_keys",uncompressed="134.00 B"} • {compressed="26.58 GiB",database="signoz_logs",part_count="77",rows="660353741",table="logs",uncompressed="440.63 GiB"} • {compressed="376.66 MiB",database="system",part_count="9",rows="7384984",table="query_views_log",uncompressed="5.65 GiB"} • {compressed="282.00 B",database="signoz_logs",part_count="2",rows="10",table="logs_resource_keys",uncompressed="252.00 B"} • {compressed="175.00 B",database="signoz_logs",part_count="1",rows="8",table="schema_migrations",uncompressed="136.00 B"} • {compressed="1.66 GiB",database="system",part_count="10",rows="71738317",table="trace_log",uncompressed="23.52 GiB"} • {compressed="1.08 MiB",database="signoz_metrics",part_count="9",rows="46612",table="time_series_v2",uncompressed="16.43 MiB"} • {compressed="47.63 KiB",database="signoz_logs",part_count="1",rows="359",table="usage",uncompressed="75.53 KiB"} • {compressed="722.23 MiB",database="system",part_count="39",rows="17566619",table="part_log",uncompressed="3.68 GiB"}
right now I've enabled the retention period for 6 hours.. also the size of /var/lib/clickhouse/store/ is 35GB and the size of /var/lib/clickhouse/data is in the attached image
n
I think it’s correct now
{compressed="26.58 GiB",database="signoz_logs",part_count="77",rows="660353741",table="logs",uncompressed="440.63 GiB"}
you have ingested 440.63 GB of logs which is compressed to 26.58 GB
v
thanks @nitya-signoz, yeah more or less it's equivalent... also I don't understand /store/ usage.. it's almost equal to the amount of compressed logs data
any idea on this? documentation don't cover this 😕
n
Can you share the size of store folder. Also try reducing the value of these three variables https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings/#database_catalog_unused_dir_rm_timeout_sec
database_catalog_unused_dir_hide_timeout_sec, database_catalog_unused_dir_rm_timeout_sec , database_catalog_unused_dir_cleanup_period_sec . Let me know if it makes any difference
v
it's 35GB @nitya-signoz
sure... let me try these.. thanks a lot @nitya-signoz
n
Interesting, I will dig more into this. But let me know if the changes help.
Also, you can configure the value of TTL of system tables like
query_log
etc by following this guide https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-system-tables-eat-my-disk
s
@Vaibhav Sharma is your question about what is there in
/store
vs
/data
? The
store
folder is the actual folder where data exists on disk. The
/data
just symlinks to it. Here is the
ls
output for example.
Copy code
bash-5.1# ls -la
total 0
drwxr-x---   10 root     root           320 Feb  1 04:14 .
drwxr-x---    7 root     root           224 Jan 27 11:14 ..
lrwxr-xr-x    1 root     root            67 Jan 27 11:14 distributed_samples_v2 -> /var/lib/clickhouse/store/bdc/bdc499a7-46de-4d4a-83c9-5b93ecd22571/
lrwxr-xr-x    1 root     root            67 Jan 27 11:14 distributed_time_series_v2 -> /var/lib/clickhouse/store/3d3/3d3efb61-7420-4f25-b4e1-68233f80583a/
lrwxr-xr-x    1 root     root            67 Jan 27 11:14 distributed_usage -> /var/lib/clickhouse/store/302/30266d0f-479a-416b-b860-8aa7bee594d2/
lrwxr-xr-x    1 root     root            67 Jan 27 11:14 samples_v2 -> /var/lib/clickhouse/store/be5/be543c25-5024-4a06-a834-ed83f28a085a/
lrwxr-xr-x    1 root     root            67 Feb  1 04:14 t -> /var/lib/clickhouse/store/50e/50ef3ef0-570b-45d5-bc87-be76c9ee6945/
lrwxr-xr-x    1 root     root            67 Feb  1 03:58 temp -> /var/lib/clickhouse/store/374/3749a590-aa97-45f8-912c-322caf0a91a1/
lrwxr-xr-x    1 root     root            67 Jan 27 11:14 time_series_v2 -> /var/lib/clickhouse/store/752/752cfc52-dc9e-4104-a674-e27954850567/
lrwxr-xr-x    1 root     root            67 Jan 27 11:14 usage -> /var/lib/clickhouse/store/6da/6da5eb5b-9f1a-44d1-91ec-d70fb44f9e53/
v
ahhh... yeah... I don't know.. I've seen this once when I was trying to figure out but didn't paid much attention to this correlation... thanks a lot @Srikanth Chekuri.... now it makes some sense
n
That’s great 👍 , can you share the values you changed?