Hello folks! We're evaluating Signoz and when try...
# support
g
Hello folks! We're evaluating Signoz and when trying to deploy it on a kubernetes cluster we get the migration pods failing in a loop with this:
Copy code
{"L":"info","timestamp":"2025-06-02T20:47:22.555Z","C":"signozschemamigrator/main.go:91","M":"Running migrations in sync mode","dsn":"<tcp://posta:postapw@clickhouse-posta.clickhouse.svc.cluster.local:9000>","replication":false,"cluster-name":"cluster"}
{"L":"info","timestamp":"2025-06-02T20:47:22.555Z","C":"signozschemamigrator/main.go:104","M":"Up migrations","versions":[]}
{"L":"info","timestamp":"2025-06-02T20:47:22.555Z","C":"signozschemamigrator/main.go:117","M":"Down migrations","versions":[]}
{"L":"info","timestamp":"2025-06-02T20:47:22.555Z","C":"signozschemamigrator/main.go:127","M":"Parsed DSN","optsError":"json: unsupported type: func(context.Context, string) (net.Conn, error)"}
{"L":"info","timestamp":"2025-06-02T20:47:22.556Z","C":"signozschemamigrator/main.go:133","M":"Opened connection"}
Error: failed to bootstrap migrations: failed to create dbs
failed to create dbs
code: 516, message: posta: Authentication failed: password is incorrect, or there is no user with such name.
Usage:
  signoz-schema-migrator sync [flags]

Flags:
      --down string   Down migrations to run, comma separated. Must provide down migrations explicitly to run
  -h, --help          help for sync
      --up string     Up migrations to run, comma separated. Leave empty to run all up migrations

Global Flags:
      --cluster-name string   Cluster name to use while running migrations (default "cluster")
      --dev                   Development mode
      --dsn string            Clickhouse DSN
      --replication           Enable replication
The values.yaml we're using for test is as follow:
Copy code
global:
  storageClass: vsan

clickhouse:
  enabled: false

externalClickhouse:
  host: clickhouse-posta.clickhouse.svc.cluster.local
  cluster: cluster
  user: "posta"
  password: "postapw"
  secure: false
  httpPort: 8123
  tcpPort: 9000
I can confirm that the clickhouse user can connect from anywhere, create databases, tables, etc. I event tried pre-create the databases but no luck. Can someone shed a light what are we missing on this? Thank you!
g
Hi Gutemberg. I'm not working for the team but reading your question I wanted a detail. When you wrote that you can create DB, it's with the user "posta" ?
g
Hi @Gil thanks for the reply
Yeah, I'm able to connect from that user from any machine (not just localhost) and create databases, tables, etc.
g
I dont know if the sync and async script use your custom user. Maybe it use the default
g
That failure is on the sync container
the async is just waiting looping messages that it is waiting the sync to complete
g
In the source code on the Dockerfile I can see this
Copy code
cat Dockerfile 
# use a minimal debian image
FROM debian:bookworm-slim

# define arguments and default values
ARG TARGETOS TARGETARCH
ARG USER_UID=10001

# create a non-root user for running the migrator
USER ${USER_UID}

# copy the binaries from the multi-stage build
COPY .build/${TARGETOS}-${TARGETARCH}/signoz-schema-migrator /signoz-schema-migrator

# run the binary as the entrypoint and pass the default dsn as a flag
ENTRYPOINT [ "/signoz-schema-migrator" ]
CMD ["--dsn", "<tcp://localhost:9000>"]
So the script is creating a non root user to run
g
ok but that is for the container itself, should be fine
the problem is that it is not connecting with whatever user/pw I'm passing to it as the parameter 😕
g
I'm not familiar with go and I try to see in the source code where the connection happend to see how it's handled.
But I have nore more time to check on that. I was waiting for an update to finish 😉
g
sure, thanks for trying.
I was hoping someone from SigNoz to kinda reply here. We are trying to evaluate the product but so far it is being a no go. If we deploy it with its embedded Zookeeper and Clickhouse it works. But this is a complete no go. Not going to maintain another Clickhouse deployment here.
g
I'm also evaluating Signoz. I noticed some odd behavior with certain metrics, but it seems their team is understaffed, as I haven't received a response to my question in two days. It's a promising stack, but it doesn't feel fully mature yet. Regarding avoiding ZooKeeper: I'm managing a complex but small-scale environment. Initially, I tried running Signoz without ZooKeeper, as I'm confident I don't need a clustered DB for the short to medium term. However, this failed, and a team member informed me that non-ZooKeeper installations aren't currently supported. On my end, my client prohibits using Docker/Kubernetes. I rewrote their "pure Linux" installation, but my pull request has been pending for three to four weeks now.
g
Zookeerper is for Clickhouse, not for Signoz.
And yeah, I had the same feeling. We're coming from Datadog. Perhaps we should look other options.
g
Yes that's why I'm talking about a clustered DB
And the last part is to prove my point about the understaffing
oh and nice z1000 by the way
g
Hahha, it was a ZX6R
g
My bad 😉 really fun bike too
g
😄
n
Hey @Gutemberg Ribeiro sorry for the delayed response What version of signoz and clickhouse are you running?
g
Hey Nagesh, no problem
Clickhouse we have the one from the operator
clickhouse/clickhouse-server:23.8
The Signoz it was whatever latest as of yesterday on the helm charts.
Copy code
- signoz version: 'v0.85.3'
- otel-collector version: 'v0.111.42'
n
Thanks Can you make sure you're running the
24.1.2
version of clickhouse? Also, you must whitelist the IP address range used for your environment (eg, Kubernetes nodes IPs) manually in the Signoz values chart.
Copy code
clickhouse:
  allowedNetworkIps:
    - "192.173.0.0/16"
g
I can upgrade it. But by default the chart whitelist all RFC private subnets CIDRs. As I mentioned, I can connect on other machines on the network. Even remotely from my machine over VPN. It is not a networking issue.
n
could you please share your clickhouse config here if possible?
g
sure
Copy code
apiVersion: "<http://clickhouse.altinity.com/v1|clickhouse.altinity.com/v1>"
kind: "ClickHouseInstallation"

metadata:
  name: "posta"

spec:
  defaults:
    templates: 
      dataVolumeClaimTemplate: default
      podTemplate: clickhouse:19.6
  configuration:
    users:
      posta/password: "postapw"
      posta/profile: default
      posta/grants/query: 
        - "GRANT ALL ON *.*"
    zookeeper:
      nodes:
      - host: zookeeper.zoons.svc.cluster.local
        port: 2181
    clusters:
      - name: cluster
        layout:
          shardsCount: 1
          replicasCount: 3

  templates:
    volumeClaimTemplates:
      - name: default
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 10Gi
    podTemplates:
      - name: clickhouse:19.6
        spec:
          containers:
            - name: clickhouse-pod
              image: clickhouse/clickhouse-server:23.8
s
@Gil please create a discussion / issue on SigNoz repo and I will answer it. The channel now attracts so many kinds of support requests that it has become hard to manage. I will either link to docs with answer or answer your questions on GitHub.
I am going through the thread to understand @Gutemberg Ribeiro issue
g
@Srikanth Chekuri A discussion about ?
s
You mentioned you haven't received response to something. I was suggesting create a GitHub issue/discussion with all questions you have and I will respond there.
g
About my technical issue, @Nagesh Bansal is on it (thanks). About my proposal for doc improvment I already talk about it. I have to check where but it's here : https://github.com/SigNoz/signoz-web/pull/1350
s
@Nagesh Bansal and I are colleagues (sorry i didn't mention). I had a chat with him about several open questions (i guess including yours). Instead of someone relay the answers I can directly respond. Either way, I will let him answer.
@Gutemberg Ribeiro can you share how did you create this user?
g
Which user?
the
posta
user?
s
Yes
g
You can see it being created on the
ClickhouseInstallation
resource itself.
s
Will you be able to share the contents of users.xml and generated-users.xml under
/etc/clickhouse-server/users.d
clickhouse server?
g
It is using the same operator you guys use when you are deploying the built-in clickhouse instance
but ok, let me find it out
Copy code
# cat chop-generated-users.xml 
<clickhouse>
    <users>
        <clickhouse_operator>
            <access_management>1</access_management>
            <named_collection_control>1</named_collection_control>
            <networks>
                <ip>10.42.1.116</ip>
            </networks>
            <password_sha256_hex>716b36073a90c6fe1d445ac1af85f4777c5b7a155cea359961826a030513e448</password_sha256_hex>
            <profile>clickhouse_operator</profile>
            <show_named_collections>1</show_named_collections>
            <show_named_collections_secrets>1</show_named_collections_secrets>
        </clickhouse_operator>
        <default>
            <networks>
                <host_regexp>(chi-posta-[^.]+\d+-\d+|clickhouse\-posta)\.clickhouse\.svc\.cluster\.local$</host_regexp>
                <ip>::1</ip>
                <ip>127.0.0.1</ip>
                <ip>10.42.1.157</ip>
                <ip>10.42.1.158</ip>
                <ip>10.42.1.159</ip>
            </networks>
            <profile>default</profile>
            <quota>default</quota>
        </default>
        <posta>
            <grants>
                <query>GRANT ALL ON *.*</query>
            </grants>
            <networks>
                <host_regexp>(chi-posta-[^.]+\d+-\d+|clickhouse\-posta)\.clickhouse\.svc\.cluster\.local$</host_regexp>
                <ip>::1</ip>
                <ip>127.0.0.1</ip>
            </networks>
            <password_sha256_hex>0753ebbfaa69a64a7a6006a33afb6d81b222970a2fd3bd586fb1eb0b37854482</password_sha256_hex>
            <profile>default</profile>
            <quota>default</quota>
        </posta>
    </users>
</clickhouse>
Copy code
# cat users.xml 
<clickhouse>
    <!-- See also the files in users.d directory where the settings can be overridden. -->

    <!-- Profiles of settings. -->
    <profiles>
        <!-- Default settings. -->
        <default>
        </default>

        <!-- Profile that allows only read queries. -->
        <readonly>
            <readonly>1</readonly>
        </readonly>
    </profiles>

    <!-- Users and ACL. -->
    <users>
        <!-- If user name was not specified, 'default' user is used. -->
        <default>
            <!-- See also the files in users.d directory where the password can be overridden.

                 Password could be specified in plaintext or in SHA256 (in hex format).

                 If you want to specify password in plaintext (not recommended), place it in 'password' element.
                 Example: <password>qwerty</password>.
                 Password could be empty.

                 If you want to specify SHA256, place it in 'password_sha256_hex' element.
                 Example: <password_sha256_hex>65e84be33532fb784c48129675f9eff3a682b27168c0ea744b2cf58ee02337c5</password_sha256_hex>
                 Restrictions of SHA256: impossibility to connect to ClickHouse using MySQL JS client (as of July 2019).

                 If you want to specify double SHA1, place it in 'password_double_sha1_hex' element.
                 Example: <password_double_sha1_hex>e395796d6546b1b65db9d665cd43f0e858dd4303</password_double_sha1_hex>

                 If you want to specify a previously defined LDAP server (see 'ldap_servers' in the main config) for authentication,
                  place its name in 'server' element inside 'ldap' element.
                 Example: <ldap><server>my_ldap_server</server></ldap>

                 If you want to authenticate the user via Kerberos (assuming Kerberos is enabled, see 'kerberos' in the main config),
                  place 'kerberos' element instead of 'password' (and similar) elements.
                 The name part of the canonical principal name of the initiator must match the user name for authentication to succeed.
                 You can also place 'realm' element inside 'kerberos' element to further restrict authentication to only those requests
                  whose initiator's realm matches it.
                 Example: <kerberos />
                 Example: <kerberos><realm><http://EXAMPLE.COM|EXAMPLE.COM></realm></kerberos>

                 How to generate decent password:
                 Execute: PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-'
                 In first line will be password and in second - corresponding SHA256.

                 How to generate double SHA1:
                 Execute: PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha1sum | tr -d '-' | xxd -r -p | sha1sum | tr -d '-'
                 In first line will be password and in second - corresponding double SHA1.
            -->
            <password></password>

            <!-- List of networks with open access.

                 To open access from everywhere, specify:
                    <ip>::/0</ip>

                 To open access only from localhost, specify:
                    <ip>::1</ip>
                    <ip>127.0.0.1</ip>

                 Each element of list has one of the following forms:
                 <ip> IP-address or network mask. Examples: 213.180.204.3 or 10.0.0.1/8 or 10.0.0.1/255.255.255.0
                     2a02:6b8::3 or 2a02:6b8::3/64 or 2a02:6b8::3/ffff:ffff:ffff:ffff::.
                 <host> Hostname. Example: <http://server01.clickhouse.com|server01.clickhouse.com>.
                     To check access, DNS query is performed, and all received addresses compared to peer address.
                 <host_regexp> Regular expression for host names. Example, ^server\d\d-\d\d-\d\.clickhouse\.com$
                     To check access, DNS PTR query is performed for peer address and then regexp is applied.
                     Then, for result of PTR query, another DNS query is performed and all received addresses compared to peer address.
                     Strongly recommended that regexp is ends with $
                 All results of DNS requests are cached till server restart.
            -->
            <networks>
                <ip>::/0</ip>
            </networks>

            <!-- Settings profile for user. -->
            <profile>default</profile>

            <!-- Quota for user. -->
            <quota>default</quota>

            <!-- User can create other users and grant rights to them. -->
            <!-- <access_management>1</access_management> -->
        </default>
    </users>

    <!-- Quotas. -->
    <quotas>
        <!-- Name of quota. -->
        <default>
            <!-- Limits for time interval. You could specify many intervals with different limits. -->
            <interval>
                <!-- Length of interval. -->
                <duration>3600</duration>

                <!-- No limits. Just calculate resource usage for time interval. -->
                <queries>0</queries>
                <errors>0</errors>
                <result_rows>0</result_rows>
                <read_rows>0</read_rows>
                <execution_time>0</execution_time>
            </interval>
        </default>
    </quotas>
</clickhouse>
There is no customizations here
it is the vanilla Clickhouse operator
and a very simple ClickHouseInstallation resource.
s
Copy code
<posta>
    <networks>
        <host_regexp>(chi-posta-[^.]+\d+-\d+|clickhouse\-posta)\.clickhouse\.svc\.cluster\.local$</host_regexp>
        <ip>::1</ip>
        <ip>127.0.0.1</ip>
    </networks>
    <!-- ... -->
</posta>
I can upgrade it. But by default the chart whitelist all RFC private subnets CIDRs. As I mentioned, I can connect on other machines on the network. Even remotely from my machine over VPN. It is not a networking issue.
Something doesn't add up because posta user has such a restrictive access.
g
image.png
connecting form my machine to that deployment in kubernetes
s
What is the
localhost:50372
?
g
a local SSH tunnel
not that it is running on my machine
s
Then it doesn't mean it's not a network issue right?
g
in other words, kubectl port-forward is being used pointing to the IP
no, it does mean that there is no networking issue
I'm port forwarding the IP of the kubernetes service
not the pod itself
s
That's the point right? In this case signoz migrator pod is connecting with a user which is not allowed to connecting from external. The only allowed networks are local or other shards.
g
what you mean?
it is allowed to connect
I am connecting over external network
s
Try connecting from any other pod in you setup for posta user and see if it succeeds.
g
the only reason you are seeing "loclahost" there is because I'm on a completely external network and I need to port forward the service
s
forget about connecting from your machine for a moment the following is the snippet from the xml you shared for posta user
Copy code
<posta>
            <grants>
                <query>GRANT ALL ON *.*</query>
            </grants>
            <networks>
                <host_regexp>(chi-posta-[^.]+\d+-\d+|clickhouse\-posta)\.clickhouse\.svc\.cluster\.local$</host_regexp>
                <ip>::1</ip>
                <ip>127.0.0.1</ip>
            </networks>
            <password_sha256_hex>0753ebbfaa69a64a7a6006a33afb6d81b222970a2fd3bd586fb1eb0b37854482</password_sha256_hex>
            <profile>default</profile>
            <quota>default</quota>
        </posta>
This ClickHouse user has a very restrictive access. With this user, only it's own localhost or other shards can connect
g
I have other applications using that same user now for test within the same kubernetes cluster in another pods and it just work
only Signoz is not working
everything else works
s
The code is simple, and it's not even specific to SigNoz. It's just clickhouse-go client connecting to external service with dsn at the bootstrap. Let's check one thing, can you do helm template render and share the manifest generated for signoz-schema-migrator?
g
I can, but I'll have to leave for a couple of hours now. Have a doc appt. Will paste here ASAP the output.