https://signoz.io logo
#general
Title
# general
o

oluchi orji

02/16/2023, 12:23 PM
Hello Signoz team, I noticed after a while, our services UI goes blank, we have set up retention with S3 bucket, please what could be actually wrong?
a

Ankit Nayan

02/16/2023, 12:39 PM
how many replicas of query-service are defined? It should be 1
Do the services appear and disappear or they have never seen after adding s3?
@oluchi orji
o

oluchi orji

02/16/2023, 12:47 PM
hello @Ankit Nayan, thanks for your response. 1. We have just one replica of signoz query 2. They disappear and they reappear after we uninstall and install signoz again (
S3 setup and annotation
are added in the values.yaml) file.
a

Ankit Nayan

02/16/2023, 12:56 PM
okay...can you share clickhouse logs? I am guessing if s3 connection fails, then clickhouse doesn't show any data. Also can you check if size of data is increasing in s3?
o

oluchi orji

02/16/2023, 12:56 PM
one second, let me do the checks 👇 @Ankit Nayan
Copy code
worker.go:445:dropReplicas():start:infra/signoz-clickhouse/e9e59dca-39f7-4444-91e9-5fb092c9daa1:drop replicas based on AP
I0205 00:16:26.531186       1 worker.go:462] worker.go:462:dropReplicas():end:infra/signoz-clickhouse/e9e59dca-39f7-4444-91e9-5fb092c9daa1:processed replicas: 0
I0205 00:16:26.531219       1 worker.go:419] includeStopped():infra/signoz-clickhouse/e9e59dca-39f7-4444-91e9-5fb092c9daa1:add CHI to monitoring
I0205 00:16:26.802933       1 worker.go:485] infra/signoz-clickhouse/9ca4c129-c258-425d-80b1-a956508a0752:IPs of the CHI [*****]
I0205 00:16:26.815881       1 worker.go:489] infra/signoz-clickhouse/342fa60b-416a-4027-ae25-6de4bca505b7:Update users IPS
I0205 00:16:27.042605       1 worker.go:505] markReconcileComplete():infra/signoz-clickhouse/e9e59dca-39f7-4444-91e9-5fb092c9daa1:reconcile completed
I0215 20:17:43.965089       1 controller.go:309] infra/signoz-clickhouse:endpointsInformer.UpdateFunc: IP ASSIGNED: []v1.EndpointSubset{
  v1.EndpointSubset{
    Addresses: []v1.EndpointAddress{
      v1.EndpointAddress{
        IP: "172.********",
        Hostname: "",
        NodeName: &"ip-*******l",
        TargetRef: nil,
      },
    },
    NotReadyAddresses: nil,
    Ports: []v1.EndpointPort{
      v1.EndpointPort{
        Name: "http",
        Port: 8123,
        Protocol: "TCP",
        AppProtocol: nil,
      },
      v1.EndpointPort{
        Name: "tcp",
        Port: 9000,
        Protocol: "TCP",
        AppProtocol: nil,
      },
    },
  },
}
I0215 20:17:44.020501       1 worker.go:299] infra/signoz-clickhouse/f48fbf51-ff72-45f1-abd8-96a17e4f8191:IPs of the CHI [*******]
I0215 20:17:44.026758       1 worker.go:303] infra/signoz-clickhouse/9afb9ed0-a38e-44a2-a57d-598971239d44:Update users IPS
I0215 20:17:44.035005       1 worker.go:1645] updateConfigMap():infra/signoz-clickhouse/9afb9ed0-a38e-44a2-a57d-598971239d44:Update ConfigMap infra/chi-signoz-clickhouse-common-usersd
a

Ankit Nayan

02/16/2023, 1:21 PM
this does not have much useful information
can you grep by
s3
?
also can you check size of s3 if that is receiving data?
o

oluchi orji

02/16/2023, 1:22 PM
Checking ... @Ankit Nayan
No useful info came up with
s3
except the following @Ankit Nayan
Copy code
{e899fee7-1eea-4e3f-b6dc-6e7bd6141071} <Error> TCPHandler: Code: 243. DB::Exception: Cannot reserve 1.00 MiB, not enough space. (NOT_ENOUGH_SPACE), Stack trace (when copying this message, always include the lines below):
a

Ankit Nayan

02/16/2023, 1:32 PM
how much space is left in the disk?
cc: @Prashant Shahi what's the default config? Maybe we want to change the defaults of clickhouse for better operation at scale
@oluchi orji any idea how much data you were trying to ingest?
o

oluchi orji

02/16/2023, 1:41 PM
One second, checking now @Ankit Nayan
a

Ankit Nayan

02/16/2023, 1:41 PM
and this message is also temporary..it gets fixed once heavy ingestion is over. Can you check the time of the error?
o

oluchi orji

02/16/2023, 1:42 PM
the time of the error, is an hour ago
about 10gb still left @Ankit Nayan
a

Ankit Nayan

02/16/2023, 1:49 PM
https://github.com/SigNoz/signoz/issues/2272 might be related. I will let @Srikanth Chekuri dive deeper into the issue
o

oluchi orji

02/16/2023, 1:50 PM
Alright @Ankit Nayan, thank you for your time!
s

Srikanth Chekuri

02/16/2023, 2:02 PM
@oluchi orji Can you share your S3 configuration? Our retention is currently done on the span timestamp, and then only it moves the data to cold storage. However, you need to move the data based on disk availability. Did you configure the
move_factor
? What is the approximate ingestion estimate?
p

Prashant Shahi

02/16/2023, 2:03 PM
Copy code
{e899fee7-1eea-4e3f-b6dc-6e7bd6141071} <Error> TCPHandler: Code: 243. DB::Exception: Cannot reserve 1.00 MiB, not enough space. (NOT_ENOUGH_SPACE), Stack trace (when copying this message, always include the lines below):
I have seen this error occurs when there is no enough storage for the clickhouse storage PVC i.e.
/var/lib/clickhouse
mount.
But yeah, do share your S3 configuration, so that we can have a look at it.
o

oluchi orji

02/16/2023, 2:08 PM
my default cold storage setup @Prashant Shahi
Copy code
clickhouse:
  cloud: aws
  installCustomStorageClass: false
  persistence:
    size: 30Gi
    # Cold storage configuration
  coldStorage:
    enabled: true
    defaultKeepFreeSpaceBytes: "10485760"
s3 config
Copy code
{
  "Statement": [
    {
      "Action": [
        "s3:GetObject",
        "s3:GetObjectVersion",
        "s3:PutBucketVersioning",
        "s3:PutObject"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::<bucket name>",
        "arn:aws:s3:::<bucket_name>/*"
      ]
    }
  ],
  "Version": "2012-10-17"
}
s

Srikanth Chekuri

02/16/2023, 2:12 PM
defaultKeepFreeSpaceBytes
is used to reserve some free space on any disk but that doesn’t move the data. What was your
move_factor
?
o

oluchi orji

02/16/2023, 2:13 PM
move_factor
is that a value on the values.yaml file?
s

Srikanth Chekuri

02/16/2023, 2:18 PM
I see this is unavailable in our charts, but I believe you could override this. I think that’s the reason you are not seeing services. Your disk space is getting filled, but the default detention (7 days) is set on the timestamp of the span, which will not move for a week. But since you haven’t set up any
move_factor
(i.e. % free disk space that should always exist, and if it crosses this threshold ClickHouse will move the data to cold storage).
o

oluchi orji

02/16/2023, 2:21 PM
Okay, thank you @Srikanth Chekuri, I will look up information on how to override the
move_factor
s

Srikanth Chekuri

02/16/2023, 2:24 PM
@Prashant Shahi how can @oluchi orji add the
move_factor
for volumes in our charts https://github.com/SigNoz/charts/blob/f0f467bdfb34f464c4bb14f699a038db16332be4/cha[…]ickhouse/templates/clickhouse-instance/clickhouse-instance.yaml? I am not sure if this can be done with override.yaml.
p

Prashant Shahi

02/16/2023, 3:27 PM
it would not be possible right now with override.yaml. Maybe except for using
clickhouse.files
configuration.
@Srikanth Chekuri isn't the
move_factor
set to
0.1
by default?
shouldn't that be sufficient?
s

Srikanth Chekuri

02/16/2023, 3:43 PM
That’s why I was asking for the ingestion rate. If the rate is higher, the data get dropped before the background task can move. I wanted them to try something higher and test it.
o

oluchi orji

02/16/2023, 3:44 PM
Hello @Srikanth Chekuri, how do I check for ingestion rate, is it a kubectl cmd or I have to ssh into the clickhouse pods?
s

Srikanth Chekuri

02/16/2023, 3:53 PM
Yeah, you could get relevant info by querying in ClickHouse. Let me share some command that outputs the span per duration.
Can you exec into ClickHouse and share the output of this?
Copy code
SELECT
    toStartOfInterval(timestamp, toIntervalMinute(10)) AS time,
    count() AS count
FROM signoz_traces.signoz_index_v2
GROUP BY time
ORDER BY time ASC
o

oluchi orji

02/16/2023, 4:03 PM
not found
@Srikanth Chekuri
Copy code
/ $ SELECT
sh: SELECT: not found
/ $     toStartOfInterval(timestamp, toIntervalMinute(10)) AS time,
sh: syntax error: unexpected word (expecting ")")
/ $     count() AS count
/ $ FROM signoz_traces.signoz_index_v2
sh: FROM: not found
/ $ GROUP BY time
sh: GROUP: not found
/ $ ORDER BY time ASC
sh: ORDER: not found
/ $
p

Prashant Shahi

02/16/2023, 4:07 PM
@oluchi orji you will have to execute it using
clickhouse client
o

oluchi orji

02/16/2023, 4:07 PM
I thought as much @Prashant Shahi, thanks
a

Alejandro Decchi

05/11/2023, 8:47 PM
How can I drop data since a determinated day ?
p

Prashant Shahi

05/14/2023, 4:59 PM
@Srikanth Chekuri can you please look into this?
6 Views