https://signoz.io logo
h

Hima Vyas

05/09/2022, 7:15 AM
After upgrade to 0.8.0 in kubernetes, I am following - https://signoz.io/docs/operate/upgrade/#kubernetes My namespace is signoz I have run the command kubectl -n signoz run -i -t signoz-migrate --image=signoz/migrate:0.8 \ -- -host=my-release-clickhouse -port=9000 -userName=admin -password=27ff0399-0d3a-4bd8-919d-17c2181e6fb9 signoz-migrate is created with error and status - CrashLoopBackOff Commnad
Copy code
kubectl -n signoz logs -f signoz_migrate
is returning pod not found for me though
p

Prashant Shahi

05/09/2022, 7:25 AM
Oh, there seems to be minor typo. Can you run the following?
Copy code
kubectl -n platform logs -f signoz-migrate
h

Hima Vyas

05/09/2022, 7:26 AM
getting error
Copy code
127.0.0.1 9000 default 
2022/05/09 07:25:48 dial tcp 127.0.0.1:9000: connect: connection refused
I removed clickhouse config from the command as I haven't changed any config so assuming it would be default
p

Prashant Shahi

05/09/2022, 7:27 AM
That is needed as we set default user
admin
with default password
27ff0399-0d3a-4bd8-919d-17c2181e6fb9
h

Hima Vyas

05/09/2022, 7:27 AM
Okay
but host and port could be removed right?
p

Prashant Shahi

05/09/2022, 7:39 AM
port? yes
host? nope.
h

Hima Vyas

05/09/2022, 7:39 AM
Yes only port, my bad. I have changed the host and kept admin and password as it is. Migration seems to have started
p

Prashant Shahi

05/09/2022, 7:40 AM
that's great! do let us know when it completes.
h

Hima Vyas

05/09/2022, 7:40 AM
Sure, Thanks Prashant!
QQ : If migration gets interrupted in middle, will it be able to continue for the rest from the middle?
p

Prashant Shahi

05/09/2022, 8:14 AM
it ideally should run without any problem. In case of any interruption, you can run the script again with some additional flag if required. cc @Vishal Sharma should be able to answer that query.
h

Hima Vyas

05/09/2022, 8:22 AM
Once migration is done, Will there be 2 replica of the same data on the persistent volume?
p

Prashant Shahi

05/09/2022, 8:23 AM
No, it will drop old data.
πŸ†— 1
h

Hima Vyas

05/09/2022, 11:57 AM
Writing 128512 rows ServiceName: xxxx Migrated till: 2022-05-09 025546.2034198 +0000 UTC TimeNano: 1652064946203419800 _________**********************************_________ Writing 172796 rows ServiceName: xxxx Migrated till: 2022-05-09 025156.695223079 +0000 UTC TimeNano: 1652064716695223079 So, This is how the migration is happening in chunks. Now, my migration job was interrupted and I had to create a new signoz_migrate pod and restart the process. It got restarted from the beginning itself. How is data managed here? Like how can I check how much data is yet to be migrated?
v

Vishal Sharma

05/09/2022, 12:18 PM
You need to pass two flags to specify start point: β€’
-service=[yourServiceName]
: If you want to restart the migration starting with the service
yourServiceName
after it has failed. β€’
-timeNano=[timeStampinNano]
:Timestamp in nano after which the migration needs to be restarted. Can you please tell me if you passed these params?
h

Hima Vyas

05/09/2022, 12:19 PM
I haven't passed these params. I have run the below command kubectl -n signoz run -i -t signoz-migrate --image=signoz/migrate:0.8 \ -- -host=x.x.x.x -port=9000 -userName=admin -password=27ff0399-0d3a-4bd8-919d-17c2181e6fb9
v

Vishal Sharma

05/09/2022, 12:20 PM
If you don’t pass these params then the migration script will restart resulting in copy of data in new tables.
h

Hima Vyas

05/09/2022, 12:20 PM
Okay, got it. I will pass these params. How can I check old and new table in clickhouse?
And when does the old data get removed? After the script has finished for all the services?
v

Vishal Sharma

05/09/2022, 12:25 PM
Data only gets removed after the script has completed running for all the services (starting with the service and timestamp if you have passed any)
And when does the old data get removed? After the script has finished for all the services?
How can I check old and new table in clickhouse?
Follow below commands to connect to clickhouse:
Copy code
kubectl -n platform exec -i --tty pod/chi-signoz-cluster-0-0-0 -- bash
Copy code
clickhouse-client
Old table name:
default.signoz_index
New table names:
signoz_traces.signoz_index_v2
and
signoz_traces.signoz_spans
h

Hima Vyas

05/09/2022, 12:30 PM
Okay, Thanks!
So, After the migration -
default.signoz_index
would not be required at all right?
v

Vishal Sharma

05/09/2022, 12:41 PM
Yes, that table will be dropped
πŸ†— 1
Also can you please share the reason why migration failed? Was it due to some issue with script?
h

Hima Vyas

05/09/2022, 12:43 PM
It got terminated because of network issue at my front.
πŸ†— 1
Plus I have 30+ services and 80GB of data with limited RAM so I may not be able to run the migration at one go. service and timeNano is helpful in that case for me.
πŸ‘ 1
v

Vishal Sharma

05/09/2022, 12:47 PM
Also the migration is sorted by services (alphabetically) and timestamp (descending). So let’s say the script failed at service
D
then you can safely assume that services
A-C
were successful.
h

Hima Vyas

05/09/2022, 12:48 PM
Got it! This was helpful, Thanks
πŸ‘ 1
QQ : Is it okay If I delete records manually from the table after a service is completed? I mean would it affect the script execution?
v

Vishal Sharma

05/09/2022, 1:11 PM
Yes, you can delete tables manually. You can pass
-dropOldTable=false
to prevent the script delete old tables automatically. Also you need to delete
default.signoz_error_index
too.
h

Hima Vyas

05/09/2022, 1:12 PM
Okay, understood! πŸ™Œ
@Vishal Sharma Is there a way to remove duplicates after migration? I have duplicates in new table
v

Vishal Sharma

05/11/2022, 7:19 AM
@Hima Vyas You can run
OPTIMIZE TABLE signoz_traces.signoz_index_v2 DEDUPLICATE;
and
OPTIMIZE TABLE signoz_traces.durationSort DEDUPLICATE;
and
OPTIMIZE TABLE signoz_traces.signoz_spans DEDUPLICATE;
h

Hima Vyas

05/11/2022, 7:19 AM
I have tried this but this doesn't seem to be working. I can see duplicates but deduplicate query isn't working directly
it gives 0 rows affected.
Copy code
SELECT 
                                        *,
                                        count() AS cnt
                                      FROM signoz_traces.signoz_index_v2
                                      WHERE serviceName = 'vm' and (timestamp >= '2022-05-09 14:14:14.000000000') AND (timestamp < '2022-05-09 23:23:23.000000000')
                                      GROUP BY *
                                      HAVING cnt > 1
                                      ORDER BY timestamp ASC limit 5
I am using this query to check duplicates. I am seeing duplicates for the time duration of older version only [0.7.5] There aren't duplicates after new version.
v

Vishal Sharma

05/11/2022, 7:53 AM
Couldn’t find solution on google search. I am checking clickhouse github issues to find something, will create an issue if can’t find solution.
h

Hima Vyas

05/11/2022, 8:35 AM
Thank you, I'll let you know if I have any findings.
πŸ‘ 1
v

Vishal Sharma

05/11/2022, 9:10 AM
The
OPTIMIZE TABLE signoz_traces.signoz_index_v2 FINAL DEDUPLICATE;
works well but the issue is that while migrating the key value pairs of tags are inserted in different order each time which causes the row to be different. We could try
OPTIMIZE TABLE signoz_traces.signoz_index_v2 FINAL DEDUPLICATE BY spanID;
but it’s wants spanID to be the only column in ORDER BY, there’s an issue created on this: https://github.com/ClickHouse/ClickHouse/issues/34032
Using
OPTIMIZE TABLE signoz_traces.signoz_index_v2 FINAL DEDUPLICATE BY spanID, timestamp, serviceName, name, hasError;
worked for me.
h

Hima Vyas

05/11/2022, 9:12 AM
Okay, let me try this.
v

Vishal Sharma

05/11/2022, 9:17 AM
Use
OPTIMIZE TABLE signoz_traces.durationSort FINAL DEDUPLICATE BY spanID, timestamp, durationNano;
to remove duplicates from
durationSort
table. Use
OPTIMIZE TABLE signoz_traces.signoz_spans FINAL DEDUPLICATE;
to remove duplicates from
signoz_spans
table.
h

Hima Vyas

05/11/2022, 9:50 AM
OPTIMIZE TABLE signoz_traces.signoz_index_v2 FINAL DEDUPLICATE BY spanID, timestamp, serviceName, name, hasError;
This also returning 0 rows for me. Sample query to find duplicates -
Copy code
SELECT
  *,
  count() AS cnt
FROM signoz_traces.signoz_index_v2
WHERE (serviceName = 'at') AND (timestamp >= '2022-05-09 00:00:00.000000000') AND (timestamp < '2022-05-09 01:01:01.000000000')
GROUP BY *
HAVING cnt > 1
ORDER BY timestamp ASC
LIMIT 5
Response has all the below columns as duplicates:
Copy code
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€timestamp─┬─traceID──────────────────────────┬─spanID───────────┬─parentSpanID─────┬─serviceName─┬─name─────────────────────────────────────┬─kind─┬─durationNano─┬─statusCode─┬─externalHttpMethod─┬─externalHttpUrl─┬─component─┬─dbSystem─┬─dbName─┬─dbOperation─┬─peerService─┬─events─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─httpMethod─┬─httpUrl─┬─httpCode─┬─httpRoute─┬─httpHost─┬─msgSystem─┬─msgOperation─┬─hasError─┬─tagMap────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─gRPCMethod─┬─gRPCCode─┬─cnt─┐
v

Vishal Sharma

05/11/2022, 9:57 AM
@Hima Vyas
OPTIMIZE TABLE signoz_traces.signoz_index_v2 FINAL DEDUPLICATE BY spanID, timestamp, serviceName, name, hasError;
won’t return any rows. How many rows are duplicates, as above optimize query worked well for me?
h

Hima Vyas

05/11/2022, 10:08 AM
around 170083776
v

Vishal Sharma

05/11/2022, 10:10 AM
Did the
signoz_spans
duplicates get removed with below query?
OPTIMIZE TABLE signoz_traces.signoz_spans FINAL DEDUPLICATE;
h

Hima Vyas

05/11/2022, 10:24 AM
No, that also didn't work but I think clickhouse might be working in background after this query for such large data. I will check after some time if duplicates are removed.
v

Vishal Sharma

05/11/2022, 10:25 AM
Yes, optimize works in background Maybe you can monitor the process the by checking if number of duplicates are decreasing
h

Hima Vyas

05/11/2022, 10:26 AM
Yes, I am checking that for signoz_index_v2, I can see some decrement. will monitor this and let you know. Thanks @Vishal Sharma for the help. appreciate it!
πŸ™Œ 2
v

Vishal Sharma

05/12/2022, 4:27 AM
@Hima Vyas Were you successful to remove duplicates?
h

Hima Vyas

05/12/2022, 4:47 AM
Yes, duplicates were removed after few hours. for large table the query was timing out. So I have updated receive_timeout parameter as well.
βœ… 1
Thanks again for the help Vishal!
πŸ™Œ 2
7 Views