Hi, I have 2 problems with our Signoz that need he...
# support
m
Hi, I have 2 problems with our Signoz that need help: 1. The traces retention period process is stuck while I change the setting, is there any way to force stop and reset the retention period? 2. I tried to update Signoz from 0.11.4 to 0.12.0 running using docker-compose.yaml file, but I got an error on otel-collector. "Error: cannot build pipelines: failed to create "clickhouselogsexporter" exporter, in pipeline "logs": cannot configure clickhouse logs exporter: failed to create database, err: code: 170, message: Requested cluster 'cluster' not found. Do I miss any steps for upgrading Signoz?
p
@Prashant Shahi should be able to help for point 2
m
thank you @Pranay, Hi @Prashant Shahi I appreciate if you could help.
p
1. The traces retention period process is stuck while I change the setting, is there any way to force stop and reset the retention period?
Time to enabling TTL depends on amount of data due to limitations of clickhouseDB. How long has been since the TTL operation stuck? If query-service container was not "restarted", TTL operation should either success/fail. If you query-service container was "restarted" after initiation of TTL operation, your checks might not be in-sync with the actual TTL being set in ClickHouse.
2. I tried to update Signoz from 0.11.4 to 0.12.0 running using docker-compose.yaml file, but I got an error on otel-collector.
It's likely that you have not
git checkout
to
v0.12.0
release tag. There is additional
clickhouse-cluster.xml
file which includes ClickHouse cluster information. https://github.com/SigNoz/signoz/tree/develop/deploy/docker/clickhouse-setup
m
It's been a while, and I don't remember exactly how long, but I just notice the storage usage of ClickhouseDB getting bigger, and when I open the settings page, I saw the changes of the retention period still ongoing. I already restart all the container, but the process not stopped.
Yes, I didn't not checkout to the v0.12.0 release tag, but I did add it manually the clickhouse-cluster.xml file to the directory and docker-compose.yaml, and the error still occurs
p
There are multiple changes in
docker-compose.yaml
like Zookeeper dependency,
clickhouse-cluster.xml
volume mounting, etc. Make sure the YAML is in-sync if not identical from the one in
develop
. https://github.com/SigNoz/signoz/blob/develop/deploy/docker/clickhouse-setup/docker-compose.yaml
It's been a while, and I don't remember exactly how long, but I just notice the storage usage of ClickhouseDB getting bigger, and when I open the settings page, I saw the changes of the retention period still ongoing. I already restart all the container, but the process not stopped.
Can you share the screenshot? If it are sure that it is stuck. You might want to delete the TTL operation checks from SQLite and apply it again. cc @Vishal Sharma @nitya-signoz
m
Hi @Prashant Shahi it's my bad, I forgot to add the clickhouse-cluster.xml. now it's successful to upgrade to v0.12.0. thank you.
p
@moronmon How long has it been since you update retention period? And how much traces data do you have?
m
I forgot exactly, I think it's passed 1 month. for now the data is almost 120GB
I did change the retention before the data reach 100GB
v
@moronmon TTL takes very long as data size becomes huge. For now you can delete TTL status entries which might have become stale after you restarted query service. Are you using docker or k8s?
m
I run it using docker, could you help how to delete the TTL?
v
Connect to query-service
Copy code
docker exec -it query-service sh
Run the following:
Copy code
# install sqlite
apk update
apk add sqlite

# open sqlite with signoz.db
sqlite3 /var/lib/signoz/signoz.db

# (sqlite shell) check existing ttl status
select * from ttl_status;

# (sqlite shell) delete all ttl status
DELETE from ttl_status where 1=1

# (sqlite shell) verify ttl status deletion
select * from ttl_status;
m
Thank you @Vishal Sharma it works. thank you @Prashant Shahi for your help
will get back to you if there's anything
p
👍
k
Clickhouse is using 600GB+ and I have also deleted ttl_status from query service. Still I'm unable to update retention Days.
Can Anyone Help me out?
Now OTEL COLLECTOR containers are restarting again and again, getting this error Error creating clickhouse client: code: 159, message: Watching task /clickhouse/task_queue/ddl/query-0000003161 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 1 unfinished hosts (0 of them are currently active), they are going to execute the query in background
482 Views