Hi Signoz team I ve setup Signoz in a Docker Swarm cluster a SigNoz Community #support

Hi Signoz team, I've setup Signoz in a Docker Swa...

Daniel Hilgarth

10/18/2024, 8:33 AM

Hi Signoz team, I've setup Signoz in a Docker Swarm cluster and configured S3 as per your documentation. I've done this after I already had a lot of data, the Clickhouse folder filled up my disk. It then wrote a lot of data to S3 and removed a significant chunk of data locally. Over the next days and weeks, I noticed two things: 1. Data is being written to S3 👍 2. My local disk is filling up again 👎 So, it seems as if it writes data to S3 but never deletes it from the local disk? What am I missing here?

Srikanth Chekuri

10/18/2024, 2:47 PM

Please share the full setting details. What is the disk ttl and move to s3 ttl?

Daniel Hilgarth

10/19/2024, 2:30 AM

@Srikanth Chekuri My clickhouse-storage.xml looks like this:

Copy code

<?xml version="1.0"?>
<clickhouse>
<storage_configuration>
    <disks>
        <default>
            <keep_free_space_bytes>10485760</keep_free_space_bytes>
        </default>
        <s3>
            <type>s3</type>
            <!-- For S3 cold storage,
                    if region is us-east-1, endpoint can be https://<bucket-name>.<http://s3.amazonaws.com|s3.amazonaws.com>
                    if region is not us-east-1, endpoint should be https://<bucket-name>.s3-<region>.<http://amazonaws.com|amazonaws.com>
                For GCS cold storage,
                    endpoint should be <https://storage.googleapis.com/><bucket-name>/data/
                -->
            <endpoint><https://redacted.s3.eu-central-1.amazonaws.com/data></endpoint>
            <access_key_id>REDACTED</access_key_id>
            <secret_access_key>redacted</secret_access_key>
            <!-- In case of S3, uncomment the below configuration in case you want to read
                AWS credentials from the Environment variables if they exist. -->
            <!-- <use_environment_credentials>true</use_environment_credentials> -->
            <!-- In case of GCS, uncomment the below configuration, since GCS does
                not support batch deletion and result in error messages in logs. -->
            <!-- <support_batch_delete>false</support_batch_delete> -->
        </s3>
   </disks>
   <policies>
       <tiered>
           <volumes>
                <default>
                    <disk>default</disk>
                </default>
                <s3>
                    <disk>s3</disk>
                    <perform_ttl_move_on_insert>0</perform_ttl_move_on_insert>
                </s3>
            </volumes>
        </tiered>
    </policies>
</storage_configuration>
</clickhouse>

The TTL settings look like this (I don't send metrics to Signoz):

Srikanth Chekuri

10/19/2024, 7:53 AM

Please share the output of table size query.

Copy code

SELECT
    database,
    table,
    formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed,
    formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed,
    round(usize / size, 2) AS compr_rate,
    sum(rows) AS rows,
    count() AS part_count
FROM system.parts
WHERE (active = 1) AND (database LIKE '%') AND (table LIKE '%')
GROUP BY
    database,
    table
ORDER BY size DESC;

Want to understand which is contributing more

Daniel Hilgarth

10/19/2024, 1:26 PM

Thanks for the query, I wanted to know that myself! Here is the output:

Copy code

─database───────┬─table───────────────────────┬─compressed─┬─uncompressed─┬─compr_rate─┬───────rows─┬─part_count─┐
│ signoz_logs    │ logs                        │ 38.27 GiB  │ 2.69 TiB     │      71.95 │ 1725735808 │        197 │
│ signoz_traces  │ durationSort                │ 34.41 GiB  │ 300.88 GiB   │       8.75 │  356399377 │        204 │
│ signoz_traces  │ signoz_index_v2             │ 32.35 GiB  │ 371.90 GiB   │       11.5 │  356404139 │        199 │
│ signoz_traces  │ signoz_spans                │ 18.34 GiB  │ 515.21 GiB   │       28.1 │  258399827 │        121 │
│ system         │ trace_log                   │ 1.51 GiB   │ 21.31 GiB    │      14.11 │   70086728 │         11 │
│ signoz_traces  │ span_attributes             │ 1.35 GiB   │ 4.30 GiB     │       3.19 │  147147633 │         15 │
│ system         │ query_log                   │ 844.54 MiB │ 5.12 GiB     │        6.2 │    7942158 │          7 │
│ system         │ part_log                    │ 777.36 MiB │ 5.24 GiB     │        6.9 │   10549790 │          8 │
│ system         │ metric_log                  │ 541.64 MiB │ 2.81 GiB     │       5.32 │    3197418 │         12 │
│ system         │ asynchronous_metric_log     │ 523.67 MiB │ 11.16 GiB    │      21.82 │  747384979 │         14 │
│ system         │ query_views_log             │ 322.89 MiB │ 4.26 GiB     │      13.49 │    4216055 │         12 │
│ signoz_traces  │ dependency_graph_minutes_v2 │ 224.87 MiB │ 332.37 MiB   │       1.48 │      76907 │         77 │
│ signoz_traces  │ dependency_graph_minutes    │ 148.13 MiB │ 218.67 MiB   │       1.48 │      49627 │         47 │
│ signoz_traces  │ signoz_error_index_v2       │ 8.02 MiB   │ 201.45 MiB   │      25.13 │     173922 │         44 │
│ signoz_logs    │ tag_attributes              │ 20.04 KiB  │ 372.53 KiB   │      18.59 │        689 │          4 │
│ signoz_traces  │ usage_explorer              │ 16.88 KiB  │ 36.35 KiB    │       2.15 │       2040 │         38 │
│ signoz_logs    │ usage                       │ 10.27 KiB  │ 15.40 KiB    │        1.5 │         73 │          3 │
│ signoz_traces  │ usage                       │ 10.22 KiB  │ 15.35 KiB    │        1.5 │         73 │          3 │
│ signoz_traces  │ span_attributes_keys        │ 4.51 KiB   │ 7.83 KiB     │       1.74 │        418 │          4 │
│ signoz_traces  │ top_level_operations        │ 3.68 KiB   │ 7.74 KiB     │        2.1 │        219 │          4 │
│ signoz_logs    │ logs_resource_keys          │ 842.00 B   │ 1.78 KiB     │       2.17 │         72 │          2 │
│ signoz_traces  │ schema_migrations           │ 684.00 B   │ 986.00 B     │       1.44 │         58 │          1 │
│ signoz_logs    │ logs_attribute_keys         │ 322.00 B   │ 276.00 B     │       0.86 │        o

Additional info: clickhouse is running in a container in a Docker Swarm environment - basically your Docker Swarm deployment from the repo. /var/lib/clickhouse is mounted to a volume. The size of the volume is 46.1 GiB, the disk size if 75 GiB, 13 GiB are currently free. 2 days ago, below 10 GiB were free on the disk, and I played again with those retention settings, so that some of the data gets moved to S3. That freed up about 20 GiB to about 30 GiB free, which have now been filling up again. On S3, there are about 154 GiB of data

Srikanth Chekuri

10/20/2024, 3:35 AM

Your logs and traces are contributing to the problem. You 2 months on disk and later move to s3. The move to s3 will only happen after if the log record ttl has pass 2 months. So it seems working fine to me. Did you notice any major changes in how fastly your disk gets filled?

Daniel Hilgarth

10/20/2024, 7:39 AM

So, Logs were previously three days TTL S3 as well. I've just changed it to 2 months in an effort to not get this error message below the save button. However, even changing from three days to 2 months resulted in data being removed from the local disk. That doesn't really make a lot of sense, so my conclusion is that there is more data on the local disk than it should. Can we somehow check the oldest data that is still on disk?

Srikanth Chekuri

10/20/2024, 8:23 AM

Can you share output of

show create table signoz_logs.logs_v2

and

show create table signoz_traces.signoz_index_v2

Daniel Hilgarth

10/20/2024, 8:28 AM

I don't have logs_v2. I'm still on v 0.48.1 For `signoz_logs.logs`I have this TTL output:

TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(10368000), toDateTime(timestamp / 1000000000) + toIntervalSecond(5184000) TO VOLUME 's3'

and for

signoz_traces.signoz_index_v2

it is TTL

toDateTime(timestamp) + toIntervalSecond(10368000), toDateTime(timestamp) + toIntervalSecond(259200) TO VOLUME 's3'

, both of which seem to be consistent with the UI (2 months and three days respectively)

Srikanth Chekuri

10/20/2024, 8:31 AM

As you can see from the ttl output, the ttl got updated, and that's the reason why it's filling up the disk. Total 120 days retention and 60 to move to s3 for logs and 3 days for traces. S and

Srikanth Chekuri

10/20/2024, 8:32 AM

You can verify by reading from table for row with mis timestamp.

Copy code

SELECT min(timestamp) FROM signoz_logs.logs

This will give epoch in nano

Daniel Hilgarth

10/20/2024, 8:34 AM

This explanation obviously makes sense. But how do you explain the screenshot? The arrow marks the time where I changed logs from 3 days until moving to S3 to 2 months.

Srikanth Chekuri

10/20/2024, 8:37 AM

What was traces prior settings?

Daniel Hilgarth

10/20/2024, 8:38 AM

Traces has not been changed at the time marked by the arrow

Daniel Hilgarth

10/20/2024, 8:38 AM

I tried to change to 1 month, but it seems to have been unsuccessful. Both from the UI as well as from the TTL definition on the table

Daniel Hilgarth

10/20/2024, 8:39 AM

Minimum entry is from September 16, 2024 for logs (that's about when the system was set up). Minimum entry in signoz_traces.signoz_index_v2 is September 18, 2024

Daniel Hilgarth

10/20/2024, 8:40 AM

That - minimum entry in traces - seems to confirm my suspicion, or not?

Daniel Hilgarth

10/20/2024, 1:50 PM

Does that minimum timestamp query actually make sense to determine which entries are still on local disk vs which ones are on S3?

Srikanth Chekuri

10/20/2024, 6:10 PM

No, it doesn't tell anything about the s3. Just want to see how longs has it been running.

Srikanth Chekuri

10/20/2024, 6:11 PM

I currently don't have any ideas on how to explain marked spike without more context.

Daniel Hilgarth

10/20/2024, 6:12 PM

Okay, thanks for the insights so far. I will keep an eye on this and get back if I have additional info

34 Views

Open in Slack

Previous Next