Has anyone succesfully implemented signoz community edition SigNoz Community #support

Has anyone succesfully implemented signoz communit...

Lucas Thompson

07/02/2025, 4:56 PM

Has anyone succesfully implemented signoz community edition running on ecs fargate? was running on ec2, but trying to migrate, and going okay, but looking for tips for config. Don't want to do ec2 ecs

Vibhu Pandey

07/02/2025, 6:32 PM

Does this work for you? https://signoz.io/docs/install/ecs/

Lucas Thompson

07/02/2025, 6:54 PM

that is what I am trying to work off of, and trying to adapt it to ECS fargate, since my Terraform code can already do that and I don't want to write new code for doing it ecs EC2 style my current issues seem to be figuring out how to get the clickhouse container running so the signoz container is happy see example logs from signoz container below

Copy code

July 02, 2025 at 14:49 (UTC-4:00)
{"level":"fatal","timestamp":"2025-07-02T18:49:13.480Z","caller":"query-service/main.go:144","msg":"Failed to create signoz","error":"unable to open database file: no such file or directory","stacktrace":"main.main\n\t/home/runner/work/signoz/signoz/pkg/query-service/main.go:144\nruntime.main\n\t/opt/hostedtoolcache/go/1.23.10/x64/src/runtime/proc.go:272"}
signoz-container
July 02, 2025 at 14:49 (UTC-4:00)
{"timestamp":"2025-07-02T18:49:13.47915072Z","level":"INFO","code":{"function":"<http://github.com/SigNoz/signoz/pkg/signoz.New|github.com/SigNoz/signoz/pkg/signoz.New>","file":"/home/runner/work/signoz/signoz/pkg/signoz/signoz.go","line":73},"msg":"starting signoz","version":"","variant":"community","commit":"","branch":"","go":"go1.23.10","timestamp":""}
signoz-container
July 02, 2025 at 14:49 (UTC-4:00)
{"timestamp":"2025-07-02T18:49:13.479337464Z","level":"INFO","code":{"function":"<http://github.com/SigNoz/signoz/pkg/sqlstore/sqlitesqlstore.New|github.com/SigNoz/signoz/pkg/sqlstore/sqlitesqlstore.New>","file":"/home/runner/work/signoz/signoz/pkg/sqlstore/sqlitesqlstore/provider.go","line":44},"msg":"connected to sqlite","logger":"<http://github.com/SigNoz/signoz/pkg/sqlitesqlstore|github.com/SigNoz/signoz/pkg/sqlitesqlstore>","path":"/var/lib/signoz/signoz.db"}
signoz-container
July 02, 2025 at 14:49 (UTC-4:00)
{"timestamp":"2025-07-02T18:49:13.47941071Z","level":"ERROR","code":{"function":"<http://github.com/prometheus/prometheus/promql.NewActiveQueryTracker|github.com/prometheus/prometheus/promql.NewActiveQueryTracker>","file":"/home/runner/go/pkg/mod/github.com/prometheus/prometheus@v0.304.1/promql/query_logger.go","line":137},"msg":"Failed to create directory for logging active queries","logger":"<http://github.com/SigNoz/signoz/pkg/prometheus/clickhouseprometheus|github.com/SigNoz/signoz/pkg/prometheus/clickhouseprometheus>"}
signoz-container
July 02, 2025 at 14:49 (UTC-4:00)
{"timestamp":"2025-07-02T18:49:13.479920498Z","level":"INFO","code":{"function":"<http://github.com/SigNoz/signoz/pkg/sqlmigrator.(*migrator).Migrate|github.com/SigNoz/signoz/pkg/sqlmigrator.(*migrator).Migrate>","file":"/home/runner/work/signoz/signoz/pkg/sqlmigrator/migrator.go","line":43},"msg":"starting sqlstore migrations","logger":"<http://github.com/SigNoz/signoz/pkg/sqlmigrator|github.com/SigNoz/signoz/pkg/sqlmigrator>","dialect":"sqlite"}
signoz-container
July 02, 2025 at 14:49 (UTC-4:00)
[Deprecated] flag --config is deprecated for passing prometheus config. The flag will be used for passing the entire SigNoz config. More details can be found at <https://github.com/SigNoz/signoz/issues/6805>.
signoz-container
July 02, 2025 at 14:49 (UTC-4:00)
[Deprecated] flag --flux-interval is deprecated and scheduled for removal. Please use SIGNOZ_QUERIER_FLUX__INTERVAL instead.
signoz-container
July 02, 2025 at 14:49 (UTC-4:00)
[Deprecated] flag --flux-interval-for-trace-detail is deprecated and scheduled for complete removal. Please use SIGNOZ_QUERIER_FLUX__INTERVAL instead.
signoz-container
July 02, 2025 at 14:49 (UTC-4:00)
[Deprecated] flag --cluster is deprecated and scheduled for removal. Please use SIGNOZ_TELEMETRYSTORE_CLICKHOUSE_CLUSTER instead.

Lucas Thompson

07/02/2025, 6:54 PM

I think I'm pretty close, just need some guidance

Vibhu Pandey

07/02/2025, 7:15 PM

Does this exist?

Copy code

/var/lib/signoz/signoz.db

Vibhu Pandey

07/02/2025, 7:16 PM

It looks like you are trying to read the sqlite file from this

Copy code

/var/lib/signoz/signoz.db

Lucas Thompson

07/02/2025, 7:18 PM

im not sure yet, I am trying to use an EFS Volume for all the containers. So i am deploying it on a cluster and on a single ECS service. and I can get the service and task definition to create with my Terraform, but I'm having trouble getting the Config to work together via the EFS volume and pointing all the mountpoints at it

Lucas Thompson

07/02/2025, 7:19 PM

i have all the container definitions basics setup, including the dependencies, but it's hard to tell where the main failure point is at. The above I posted is the most information, so I think I'm not understanding something correctly about the clickhouse portion

Vibhu Pandey

07/02/2025, 7:22 PM

Copy code

{
  "level": "fatal",
  "timestamp": "2025-07-02T18:49:13.480Z",
  "caller": "query-service/main.go:144",
  "msg": "Failed to create signoz",
  "error": "unable to open database file: no such file or directory",
  "stacktrace": "main.main\n\t/home/runner/work/signoz/signoz/pkg/query-service/main.go:144\nruntime.main\n\t/opt/hostedtoolcache/go/1.23.10/x64/src/runtime/proc.go:272"
}

This tells me signoz is not able to open sqlite. Looks like a sqlite error and not a clickhouse error. Try setting

SIGNOZ_SQLSTORE_SQLITE_PATH

to something inside your EFS to make it work. Eg: If EFS is mounted at

/mnt

, try specifying the env variable to

/mnt/signoz.db

Lucas Thompson

07/02/2025, 7:22 PM

ok thank you will try something like that shortly

Lucas Thompson

07/03/2025, 3:03 PM

okay tried something like that and can see in the logs that it looks a little happier, but still basically bombing out, that didn't seem to be the key thing. where else to check?

Copy code

July 03, 2025 at 10:58 (UTC-4:00)
{"timestamp":"2025-07-03T14:58:28.386477134Z","level":"INFO","code":{"function":"<http://github.com/SigNoz/signoz/pkg/signoz.New|github.com/SigNoz/signoz/pkg/signoz.New>","file":"/home/runner/work/signoz/signoz/pkg/signoz/signoz.go","line":73},"msg":"starting signoz","version":"","variant":"community","commit":"","branch":"","go":"go1.23.10","timestamp":""}
signoz-container
July 03, 2025 at 10:58 (UTC-4:00)
{"timestamp":"2025-07-03T14:58:28.386680605Z","level":"INFO","code":{"function":"<http://github.com/SigNoz/signoz/pkg/sqlstore/sqlitesqlstore.New|github.com/SigNoz/signoz/pkg/sqlstore/sqlitesqlstore.New>","file":"/home/runner/work/signoz/signoz/pkg/sqlstore/sqlitesqlstore/provider.go","line":44},"msg":"connected to sqlite","logger":"<http://github.com/SigNoz/signoz/pkg/sqlitesqlstore|github.com/SigNoz/signoz/pkg/sqlitesqlstore>","path":"/mnt/efs/signoz.db"}
signoz-container
July 03, 2025 at 10:58 (UTC-4:00)
{"timestamp":"2025-07-03T14:58:28.386799713Z","level":"ERROR","code":{"function":"<http://github.com/prometheus/prometheus/promql.NewActiveQueryTracker|github.com/prometheus/prometheus/promql.NewActiveQueryTracker>","file":"/home/runner/go/pkg/mod/github.com/prometheus/prometheus@v0.304.1/promql/query_logger.go","line":137},"msg":"Failed to create directory for logging active queries","logger":"<http://github.com/SigNoz/signoz/pkg/prometheus/clickhouseprometheus|github.com/SigNoz/signoz/pkg/prometheus/clickhouseprometheus>"}
signoz-container
July 03, 2025 at 10:58 (UTC-4:00)
{"timestamp":"2025-07-03T14:58:28.38755219Z","level":"INFO","code":{"function":"<http://github.com/SigNoz/signoz/pkg/sqlmigrator.(*migrator).Migrate|github.com/SigNoz/signoz/pkg/sqlmigrator.(*migrator).Migrate>","file":"/home/runner/work/signoz/signoz/pkg/sqlmigrator/migrator.go","line":43},"msg":"starting sqlstore migrations","logger":"<http://github.com/SigNoz/signoz/pkg/sqlmigrator|github.com/SigNoz/signoz/pkg/sqlmigrator>","dialect":"sqlite"}
signoz-container
July 03, 2025 at 10:58 (UTC-4:00)
{"level":"fatal","timestamp":"2025-07-03T14:58:28.387Z","caller":"query-service/main.go:144","msg":"Failed to create signoz","error":"unable to open database file: no such file or directory","stacktrace":"main.main\n\t/home/runner/work/signoz/signoz/pkg/query-service/main.go:144\nruntime.main\n\t/opt/hostedtoolcache/go/1.23.10/x64/src/runtime/proc.go:272"}

Lucas Thompson

07/10/2025, 6:02 PM

@Vibhu Pandey resurrecting this thread somewhat, made significant progress, just need a little more guidance. I think i have the config, zookeeper, and clickhouse containers workable, now trying to add in the signoz container itself and a few questions 1. The AWS Cont Definition environment variables I'm passing from the task definition don't seem to be picked up by signoz. I saw these types of logs initially, and I added them to the task def, but afterwards still seeing those logs, suggesting that there is a default config somewhere I'm missing and they aren't getting picked up/overridden. So I need help understanding why a. See here logs

Copy code

[Deprecated] flag --config is deprecated for passing prometheus config. The flag will be used for passing the entire SigNoz config. More details can be found at <https://github.com/SigNoz/signoz/issues/6805>.
signoz-container
July 10, 2025 at 13:32 (UTC-4:00)
[Deprecated] flag --flux-interval is deprecated and scheduled for removal. Please use SIGNOZ_QUERIER_FLUX__INTERVAL instead.
signoz-container
July 10, 2025 at 13:32 (UTC-4:00)
[Deprecated] flag --flux-interval-for-trace-detail is deprecated and scheduled for complete removal. Please use SIGNOZ_QUERIER_FLUX__INTERVAL instead.
signoz-container
July 10, 2025 at 13:32 (UTC-4:00)
[Deprecated] flag --cluster is deprecated and scheduled for removal. Please use SIGNOZ_TELEMETRYSTORE_CLICKHOUSE_CLUSTER instead.

a. and here's my config

Copy code

environment = [
    { 
      "name": "SIGNOZ_ALERTMANAGER_PROVIDER",
      "value": "signoz" 
    },
    { 
      "name": "SIGNOZ_TELEMETRYSTORE_CLICKHOUSE_DSN",
      "value": "<tcp://clickhouse:9000>" 
    },
    { 
      "name": "SIGNOZ_TELEMETRYSTORE_CLICKHOUSE_CLUSTER",
      "value": "cluster" 
    },
    { 
      "name": "SIGNOZ_SQLSTORE_SQLITE_PATH",         
      "value": "/var/lib/signoz/signoz.db" 
    },
    { 
      "name": "SIGNOZ_TELEMETRYSTORE_PROVIDER",                             
      "value": "clickhouse" 
    },
    { 
      "name": "SIGNOZ_ANALYTICS_ENABLED",                   
      "value": "true" 
    },
    {
      "name": "SIGNOZ_QUERIER_FLUX__INTERVAL",
      "value": "5m"
    },
  ]

2. Secondly Im not sure how to resolve the error regarding the

Failed to create directory for logging active queries

, im guessing it leads into the other error regarding

Failed to create signoz, unable to open database file

but one thing at a time

Copy code

{"level":"fatal","timestamp":"2025-07-10T17:32:17.350Z","caller":"query-service/main.go:144","msg":"Failed to create signoz","error":"unable to open database file: no such file or directory","stacktrace":"main.main\n\t/home/runner/work/signoz/signoz/pkg/query-service/main.go:144\nruntime.main\n\t/opt/hostedtoolcache/go/1.23.10/x64/src/runtime/proc.go:272"}
signoz-container
July 10, 2025 at 13:32 (UTC-4:00)
{"timestamp":"2025-07-10T17:32:17.348940656Z","level":"INFO","code":{"function":"<http://github.com/SigNoz/signoz/pkg/signoz.New|github.com/SigNoz/signoz/pkg/signoz.New>","file":"/home/runner/work/signoz/signoz/pkg/signoz/signoz.go","line":73},"msg":"starting signoz","version":"","variant":"community","commit":"","branch":"","go":"go1.23.10","timestamp":""}
signoz-container
July 10, 2025 at 13:32 (UTC-4:00)
{"timestamp":"2025-07-10T17:32:17.34928235Z","level":"INFO","code":{"function":"<http://github.com/SigNoz/signoz/pkg/sqlstore/sqlitesqlstore.New|github.com/SigNoz/signoz/pkg/sqlstore/sqlitesqlstore.New>","file":"/home/runner/work/signoz/signoz/pkg/sqlstore/sqlitesqlstore/provider.go","line":44},"msg":"connected to sqlite","logger":"<http://github.com/SigNoz/signoz/pkg/sqlitesqlstore|github.com/SigNoz/signoz/pkg/sqlitesqlstore>","path":"/var/lib/signoz/signoz.db"}
signoz-container
July 10, 2025 at 13:32 (UTC-4:00)
{"timestamp":"2025-07-10T17:32:17.3494089Z","level":"ERROR","code":{"function":"<http://github.com/prometheus/prometheus/promql.NewActiveQueryTracker|github.com/prometheus/prometheus/promql.NewActiveQueryTracker>","file":"/home/runner/go/pkg/mod/github.com/prometheus/prometheus@v0.304.1/promql/query_logger.go","line":137},"msg":"Failed to create directory for logging active queries","logger":"<http://github.com/SigNoz/signoz/pkg/prometheus/clickhouseprometheus|github.com/SigNoz/signoz/pkg/prometheus/clickhouseprometheus>"}

would much appreciate some guidance on this. I'm very excited because I believe I am very close, just need some help getting over the finish line. @Nagesh Bansal and maybe @Srikanth Chekuri as well apologies for the multiple tags, I just can't contain my excitement Thanks!

Vibhu Pandey

07/10/2025, 6:20 PM

Copy code

Set SIGNOZ_PROMETHEUS_ACTIVE__QUERY__TRACKER_ENABLED to false

Vibhu Pandey

07/10/2025, 6:21 PM

This will get rid of the

Failed to create directory for logging active queries

Vibhu Pandey

07/10/2025, 6:21 PM

Copy code

{ 
      "name": "SIGNOZ_SQLSTORE_SQLITE_PATH",         
      "value": "/var/lib/signoz/signoz.db" 
    },

Vibhu Pandey

07/10/2025, 6:21 PM

Help me understand this. Does this path exist on your underlying volume?

Lucas Thompson

07/10/2025, 6:26 PM

yes it does, I co-opted the config fetcher container, and since I am using an EFS volume, am handling whatever pre-setup I need to from there before clickhouse,signoz and the rest of the containers come up. So i'm creating that path initially. I will try as you suggest, but my concern is that It won't work (see my first point above) where I am setting these vars, but am still seeing logs in the signoz container to use the new ENV var conventions (which I am) so the old ones must be getting used somewhere

Vibhu Pandey

07/10/2025, 6:27 PM

Can you send your entire task definition? Especially the

args

Lucas Thompson

07/10/2025, 6:28 PM

yes one moment

Lucas Thompson

07/10/2025, 6:28 PM

just the signoz container, or all of it?

Vibhu Pandey

07/10/2025, 6:28 PM

let's start with signoz container

Lucas Thompson

07/10/2025, 6:28 PM

👍

Lucas Thompson

07/10/2025, 6:30 PM

im creating the task/container definitions with a terraform module. It is mostly 1v1 comparison and easily readable, is that ok? or do you want traditional json?

Lucas Thompson

07/10/2025, 6:32 PM

signoz cont task def

Lucas Thompson

07/10/2025, 6:33 PM

and I did try recreating it with the SIGNOZ_PROMETHEUS_ACTIVE__QUERY__TRACKER_ENABLED set to false and no change, its not picking up those env vars from the ecs task def for some reason, not sure what I'm doing wrongly

Vibhu Pandey

07/10/2025, 6:35 PM

You can get rid of this: it has been deprecated.

Copy code

command = ["--config=/root/config/prometheus.yml"]

Vibhu Pandey

07/10/2025, 6:35 PM

Rest of the warnings can be ignored, not that big an issue

Lucas Thompson

07/10/2025, 6:35 PM

okay I was wondering about that, are the ecs docs just too old?

Vibhu Pandey

07/10/2025, 6:36 PM

Copy code

{"function":"<http://github.com/SigNoz/signoz/pkg/sqlstore/sqlitesqlstore.New|github.com/SigNoz/signoz/pkg/sqlstore/sqlitesqlstore.New>","file":"/home/runner/work/signoz/signoz/pkg/sqlstore/sqlitesqlstore/provider.go","line":44},"msg":"connected to sqlite","logger":"<http://github.com/SigNoz/signoz/pkg/sqlitesqlstore|github.com/SigNoz/signoz/pkg/sqlitesqlstore>","path":"/var/lib/signoz/signoz.db"}

This indicates that

SIGNOZ_SQLSTORE_SQLITE_PATH

is being picked up correctly. But when signoz tries to do something, it realizes nothing exists at

/var/lib/signoz/signoz.db

Vibhu Pandey

07/10/2025, 6:36 PM

You need to make sure that

/var/lib/signoz

is present and the container has permissions to create a

signoz.db

file.

Vibhu Pandey

07/10/2025, 6:37 PM

I see you are mounting it at

/signoz-setup/var/lib/signoz

Vibhu Pandey

07/10/2025, 6:37 PM

Doesnt it make sense to specify SIGNOZ_SQLSTORE_SQLITE_PATH=

/signoz-setup/var/lib/signoz/signoz.db

Lucas Thompson

07/10/2025, 6:39 PM

good call out, I've been trying a lot of things to get this working, and must have forgotten about that. Will try and update shortly and I can confirm that the path is present on the EFS volume mount, if this doesn't work will look into if its a permissions issue.

Vibhu Pandey

07/10/2025, 6:39 PM

Let me know. Rooting for you 🔥

Lucas Thompson

07/10/2025, 6:40 PM

ty for your help im very excited to feel so close 🤞

🙌 1

Lucas Thompson

07/10/2025, 6:44 PM

more logs

Lucas Thompson

07/10/2025, 6:44 PM

definitely made a difference, see logs

Vibhu Pandey

07/10/2025, 6:44 PM

Nicee now 8080 seems to be occupied

Lucas Thompson

07/10/2025, 6:46 PM

hmmm, i've been pretty careful to follow patterns of https://signoz.io/docs/install/ecs/

Lucas Thompson

07/10/2025, 6:46 PM

so the other containers in the task def I have right now aren't overriding it somehow...

Lucas Thompson

07/10/2025, 6:47 PM

i think my ALB target group health check is set to that currently, could that be the issue?

Vibhu Pandey

07/10/2025, 6:48 PM

Zookeeper admin listens on 8080....

Lucas Thompson

07/10/2025, 6:48 PM

image.png

Vibhu Pandey

07/10/2025, 6:49 PM

Ahhhhh all of them are running as sidecars?

Lucas Thompson

07/10/2025, 6:49 PM

i think i am using the zookeeper healtcheck path from the ALB right now to keep it alive for the moment...

Lucas Thompson

07/10/2025, 6:50 PM

is that not what the docs suggest? https://signoz.io/docs/install/ecs/#5task-definitions

Lucas Thompson

07/10/2025, 6:51 PM

Below is a single, all-in-one ECS task definition JSON that includes every SigNoz component in one task:

Vibhu Pandey

07/10/2025, 6:55 PM

Yup and I'll have to apologize for that. It seems the docs might not be correct. Here is what needs to be changed in zookeeper:

Copy code

{
  "name": "zookeeper-1",
  "image": "bitnami/zookeeper:3.7.1",
  "cpu": 512,
  "memory": 512,
  "memoryReservation": 512,
  "essential": true,
  "portMappings": [
    {
      "containerPort": 2181,
      "hostPort": 2181,
      "protocol": "tcp"
    },
    {
      "containerPort": 2888,
      "hostPort": 2888,
      "protocol": "tcp"
    },
    {
      "containerPort": 3888,
      "hostPort": 3888,
      "protocol": "tcp"
    },
    {
      "containerPort": 9141,
      "hostPort": 9141,
      "protocol": "tcp"
    }
  ],
  "environment": [
    {
      "name": "ALLOW_ANONYMOUS_LOGIN",
      "value": "yes"
    },
    {
      "name": "ZOO_SERVER_ID",
      "value": "1"
    },
    {
      "name": "ZOO_ENABLE_PROMETHEUS_METRICS",
      "value": "yes"
    },
    {
      "name": "ZOO_AUTOPURGE_INTERVAL",
      "value": "1"
    },
    {
      "name": "ZOO_PROMETHEUS_METRICS_PORT_NUMBER",
      "value": "9141"
    },
    {
      "name": "ZOO_ADMIN_SERVER_PORT_NUMBER",
      "value": "3181"
    }
  ],
  "healthCheck": {
    "command": [
      "CMD-SHELL",
      "curl -s -m 2 <http://localhost:3181/commands/ruok> | grep error | grep null"
    ],
    "interval": 30,
    "timeout": 5,
    "retries": 3,
    "startPeriod": 30
  },
  "logConfiguration": {
    "logDriver": "awslogs",
    "options": {
      "awslogs-group": "/aws/ecs/<LOG_GROUP>",
      "awslogs-region": "<AWS_REGION>",
      "awslogs-stream-prefix": "zookeeper"
    }
  }
}

Vibhu Pandey

07/10/2025, 6:56 PM

2 changes: 1. 1 env variable added:

{ "name": "ZOO_ADMIN_SERVER_PORT_NUMBER","value":"3181"}

2. healthcheck modified:

["CMD-SHELL","curl -s -m 2 <http://localhost:3181/commands/ruok> | grep error | grep null"]

Lucas Thompson

07/10/2025, 6:57 PM

okay will try that real quick

Lucas Thompson

07/10/2025, 7:00 PM

think we are getting much closer

Vibhu Pandey

07/10/2025, 7:01 PM

Yes Yes everything should come up 💪

Lucas Thompson

07/10/2025, 7:03 PM

hooray!

Lucas Thompson

07/10/2025, 7:03 PM

oh heck yeah we are looking decent now!

Vibhu Pandey

07/10/2025, 7:04 PM

Are you able to open the UI?

Lucas Thompson

07/10/2025, 7:04 PM

checking now

Lucas Thompson

07/10/2025, 7:10 PM

no luck yet, but its likely due to my own infra network settings, and I just have to make sure those are all correct, will just take me a few mins to remember all the things to check, but I'm confident I can do that, and get the rest of the way there. The task is staying up and all the containers are staying up and healthy too, so overall feeling very good right now!

Vibhu Pandey

07/10/2025, 7:11 PM

Let me know 🎉

Nagesh Bansal

07/10/2025, 7:24 PM

Hey @Lucas Thompson, Thanks for trying to run Signoz on ECS, gained some valuable insights from the conversation and will ensure that we incorporate the changes.

Lucas Thompson

07/10/2025, 7:24 PM

yeah when I'm done I'm willing to connect and share more of what I learned, things i had to change

🙌 1

Lucas Thompson

07/10/2025, 7:29 PM

final question, what do i need to do to ensure i can connect to the frontend/UI via HTTPS? I have a domain name r53 record already, and I think the networking security groups should be all fine, but I think Signoz needs to know somehow?

Lucas Thompson

07/10/2025, 7:29 PM

looking at this currently https://signoz.io/docs/tutorial/setting-up-tls-for-signoz/

Lucas Thompson

07/10/2025, 7:30 PM

not sure what the equivalent

values.yaml

is in my situation as opposed to the k8s docshttps://signoz.io/docs/tutorial/setting-up-tls-for-signoz/

Vibhu Pandey

07/10/2025, 7:31 PM

You can do TLS termination at your ALB Lucas.

Vibhu Pandey

07/10/2025, 7:31 PM

Just add a certificate at the ALB layer

Lucas Thompson

07/10/2025, 7:32 PM

i do have a cert on the alb, it is a working wildcard cert pattern that works with other ecs --> target groups setups in our env

Lucas Thompson

07/10/2025, 7:32 PM

and the r53 record should fit under that, so I don't think its that

Lucas Thompson

07/10/2025, 7:37 PM

I think I'm good now, might have been DNS just having to figure it out, but I can reach the UI!

🙌 1

Lucas Thompson

07/14/2025, 6:08 PM

just a few more questions can i get some guidance on where the clickhouse host is setting itself? i can't seem to track down where the

10.1.0.2:53

is coming from in this otel collector error log

Copy code

{
  "level": "error",
  "timestamp": "2025-07-14T17:23:17.974Z",
  "caller": "opamp/server_client.go:143",
  "msg": "Failed to connect to the server: %v",
  "component": "opamp-server-client",
  "error": "dial tcp: lookup signoz on 10.1.0.2:53: no such host",
  "stacktrace": "github.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).Start.func2\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/server_client.go:143\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).tryConnectOnce.func1\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/wsclient.go:232\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).tryConnectOnce\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/wsclient.go:253\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).ensureConnected\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/wsclient.go:282\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runOneCycle\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/wsclient.go:326\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runUntilStopped\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/wsclient.go:412\ngithub.com/open-telemetry/opamp-go/client/internal.(*ClientCommon).StartConnectAndRun.func1\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/internal/clientcommon.go:208"
}

for context here is my current otel collector config

Copy code

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  awscloudwatch:
    region: us-east-1
    imds_endpoint: <http://169.254.169.254/latest/>
    logs:
      poll_interval: 1m
      groups:
        named:
          /ecs/kh-ecs:
          /ecs/keycloak:
  prometheus:
    config:
      global:
        scrape_interval: 60s
      scrape_configs:
        - job_name: otel-collector
          static_configs:
          - targets:
              - localhost:8888
            labels:
              job_name: otel-collector
  hostmetrics:
    collection_interval: 30s  # Frequency of metrics collection.
    scrapers:
      cpu: {}
      load:
        cpu_average: false
      memory: {}
      disk: {}
      filesystem: {}
      network: {}
      paging: {}
      processes: {}
  syslog:
    tcp:
      listen_address: "0.0.0.0:54527"
    protocol: rfc3164
    location: UTC
    operators:
      - type: move
        from: attributes.message
        to: body
processors:
  batch:
    send_batch_size: 10000
    send_batch_max_size: 11000
    timeout: 10s
  resourcedetection:
    # Using OTEL_RESOURCE_ATTRIBUTES envvar, env detector adds custom labels.
    detectors: [env, system, ec2]
    system:
      hostname_sources: [os]
    timeout: 2s
  resource/env:
    attributes:
    - key: deployment.environment
      value: develop
      action: upsert
  signozspanmetrics/delta:
    metrics_exporter: clickhousemetricswrite, signozclickhousemetrics
    metrics_flush_interval: 60s
    latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s ]
    dimensions_cache_size: 100000
    aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA
    enable_exp_histogram: true
    dimensions:
      - name: service.namespace
        default: default
      - name: deployment.environment
        default: default
      # This is added to ensure the uniqueness of the timeseries
      # Otherwise, identical timeseries produced by multiple replicas of
      # collectors result in incorrect APM metrics
      - name: signoz.collector.id
      - name: service.version
      - name: browser.platform
      - name: browser.mobile
      - name: k8s.cluster.name
      - name: k8s.node.name
      - name: k8s.namespace.name
      - name: host.name
      - name: host.type
      - name: container.name
extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  pprof:
    endpoint: 0.0.0.0:1777
exporters:
  clickhousetraces:
    datasource: <tcp://clickhouse:9000/signoz_traces>
    low_cardinal_exception_grouping: ${env:LOW_CARDINAL_EXCEPTION_GROUPING}
    use_new_schema: true
  clickhousemetricswrite:
    endpoint: <tcp://clickhouse:9000/signoz_metrics>
    disable_v2: true
    resource_to_telemetry_conversion:
      enabled: true
  clickhousemetricswrite/prometheus:
    endpoint: <tcp://clickhouse:9000/signoz_metrics>
    disable_v2: true
  signozclickhousemetrics:
    dsn: <tcp://clickhouse:9000/signoz_metrics>
  clickhouselogsexporter:
    dsn: <tcp://clickhouse:9000/signoz_logs>
    timeout: 10s
    use_new_schema: true
  # debug: {}
  otlp:
    endpoint: "127.0.0.1:4317"   # Your SigNoz collector endpoint.
    tls:
      insecure: true
service:
  telemetry:
    logs:
      encoding: json
  extensions:
    - health_check
    - pprof
  pipelines:
    traces:
      receivers: [otlp]
      processors: [signozspanmetrics/delta, batch]
      exporters: [clickhousetraces]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [clickhousemetricswrite, signozclickhousemetrics]
    metrics/prometheus:
      receivers: [prometheus]
      processors: [batch]
      exporters: [clickhousemetricswrite/prometheus, signozclickhousemetrics]
    metrics/hostmetrics:
      receivers: [hostmetrics]
      processors: [resourcedetection, resource/env]
      exporters: [otlp]
    logs:
      receivers: [otlp,awscloudwatch, syslog]
      processors: [batch]
      exporters: [clickhouselogsexporter]

Lucas Thompson

07/15/2025, 2:10 PM

@Vibhu Pandey given that you stand to gain from some of the stuff I have figured out with signoz on ecs fargate, would one of your support team be willing to meet with me today to get this over the finish line? struggling to allow get the otel-collector talking to the clickhouse container still

Vibhu Pandey

07/15/2025, 2:12 PM

This is AWS VPC's DNS resolver address (

10.1.0.2:53

). it says collector is unable to find the signoz container. There must be another config map in the collector (something that has opamp in it)

Lucas Thompson

07/15/2025, 2:18 PM

also seeing logs like this too, is this because of the first error above? collector can't find signoz, so it can't create the clickhouse db's?

Copy code

{
  "level": "error",
  "ts": "2025-07-15T14:02:45.887Z",
  "caller": "service@v0.128.0/service.go:189",
  "msg": "error found during service initialization",
  "resource": {
    "service.instance.id": "fb079bc5-0001-40e9-885f-f039f1cd3c73",
    "service.name": "/signoz-otel-collector",
    "service.version": "dev"
  },
  "error": "failed to build pipelines: failed to create \"clickhouselogsexporter\" exporter for data type \"logs\": cannot configure clickhouse logs exporter: code: 81, message: Database signoz_logs does not exist",
  "stacktrace": "<http://go.opentelemetry.io/collector/service.New.func1|go.opentelemetry.io/collector/service.New.func1>\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/service@v0.128.0/service.go:189\ngo.opentelemetry.io/collector/service.New\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/service@v0.128.0/service.go:220\ngo.opentelemetry.io/collector/otelcol.(*Collector).setupConfigurationComponents\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/otelcol@v0.128.0/collector.go:197\ngo.opentelemetry.io/collector/otelcol.(*Collector).Run\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/otelcol@v0.128.0/collector.go:312\ngithub.com/SigNoz/signoz-otel-collector/signozcol.(*WrappedCollector).Run.func1\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/signozcol/collector.go:103"
}

Lucas Thompson

07/15/2025, 2:19 PM

also here is a log from the signoz container saying it can't reach clickhouse

Copy code

{
  "level": "error",
  "timestamp": "2025-07-15T14:02:16.171Z",
  "caller": "app/server.go:175",
  "msg": "failed to preload metrics metadata",
  "error": "dial tcp: lookup clickhouse on 10.1.0.2:53: no such host",
  "stacktrace": "<http://github.com/SigNoz/signoz/pkg/query-service/app.NewServer|github.com/SigNoz/signoz/pkg/query-service/app.NewServer>\n\t/home/runner/work/signoz/signoz/pkg/query-service/app/server.go:175\nmain.main\n\t/home/runner/work/signoz/signoz/pkg/query-service/main.go:147\nruntime.main\n\t/opt/hostedtoolcache/go/1.23.10/x64/src/runtime/proc.go:272"
}

Vibhu Pandey

07/15/2025, 2:20 PM

Right, I think if you replace

<tcp://clickhouse:9000>

with

<tcp://localhost:9000>

, things should work? Since they all run as sidecars?

Lucas Thompson

07/15/2025, 2:21 PM

I did actually try just that this morning

Copy code

exporters:
  clickhousetraces:
    datasource: <tcp://localhost:9000/signoz_traces>
    low_cardinal_exception_grouping: ${env:LOW_CARDINAL_EXCEPTION_GROUPING}
    use_new_schema: true
  clickhousemetricswrite:
    endpoint: <tcp://localhost:9000/signoz_metrics>
    disable_v2: true
    resource_to_telemetry_conversion:
      enabled: true
  clickhousemetricswrite/prometheus:
    endpoint: <tcp://localhost:9000/signoz_metrics>
    disable_v2: true
  signozclickhousemetrics:
    dsn: <tcp://localhost:9000/signoz_metrics>
  clickhouselogsexporter:
    dsn: <tcp://localhost:9000/signoz_logs>
    timeout: 10s
    use_new_schema: true

and those errors i just pasted above were from that attempt

Vibhu Pandey

07/15/2025, 2:21 PM

Were you able to find the

opamp

file?

Lucas Thompson

07/15/2025, 2:24 PM

not sure I'm understanding, you referring to the

manager-config.yaml

? if so I am grabbing it and setting it up likethe flags on the otel-collector are looking for and it has this as its content

Copy code

server_endpoint: <ws://signoz:4320/v1/opamp>

Vibhu Pandey

07/15/2025, 2:24 PM

Yup let's change this to localhost also

Vibhu Pandey

07/15/2025, 2:24 PM

instead of signoz

Lucas Thompson

07/15/2025, 2:25 PM

okay i will try that

Lucas Thompson

07/15/2025, 2:26 PM

okay will try this?

Copy code

server_endpoint: <ws://localhost:4320/v1/opamp>

Vibhu Pandey

07/15/2025, 2:26 PM

Yess

Lucas Thompson

07/15/2025, 2:37 PM

okay from the

signoz-container

now seeing a new error, but seems like progress

Copy code

{"level":"error","timestamp":"2025-07-15T14:29:58.841Z","caller":"opamp/opamp_server.go:117","msg":"Failed to find or create agent","agentID":"01980e7e-0da0-79c0-bd92-95e71836c201","error":"cannot create agent without orgId","errorVerbose":"cannot create agent without orgId\<http://ngithub.com/SigNoz/signoz/pkg/query-service/app/opamp/model.(*Agents).FindOrCreateAgent|ngithub.com/SigNoz/signoz/pkg/query-service/app/opamp/model.(*Agents).FindOrCreateAgent>\n\t/home/runner/work/signoz/signoz/pkg/query-service/app/opamp/model/agents.go:91\ngithub.com/SigNoz/signoz/pkg/query-service/app/opamp.(*Server).OnMessage\n\t/home/runner/work/signoz/signoz/pkg/query-service/app/opamp/opamp_server.go:115\ngithub.com/open-telemetry/opamp-go/server.(*server).handleWSConnection\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/server/serverimpl.go:253\nruntime.goexit\n\t/opt/hostedtoolcache/go/1.23.10/x64/src/runtime/asm_amd64.s:1700","stacktrace":"github.com/SigNoz/signoz/pkg/query-service/app/opamp.(*Server).OnMessage\n\t/home/runner/work/signoz/signoz/pkg/query-service/app/opamp/opamp_server.go:117\ngithub.com/open-telemetry/opamp-go/server.(*server).handleWSConnection\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/server/serverimpl.go:253"}

and on the otel-collector logs side seeing opamp log still

Copy code

{
  "level": "error",
  "timestamp": "2025-07-15T14:29:58.841Z",
  "caller": "opamp/server_client.go:146",
  "msg": "Server returned an error response: %v",
  "component": "opamp-server-client",
  "": "",
  "stacktrace": "<http://github.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).Start.func3|github.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).Start.func3>\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/server_client.go:146\ngithub.com/open-telemetry/opamp-go/client/internal.(*receivedProcessor).processErrorResponse\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/internal/receivedprocessor.go:247\ngithub.com/open-telemetry/opamp-go/client/internal.(*receivedProcessor).ProcessReceivedMessage\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/internal/receivedprocessor.go:170\ngithub.com/open-telemetry/opamp-go/client/internal.(*wsReceiver).ReceiverLoop\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/internal/wsreceiver.go:94"
}

as well as more clickhouse pipelines/exporter issues, but assuming we will get to those afterwards

Lucas Thompson

07/15/2025, 2:39 PM

although I am checking one thing out, while I am doing that, can you clarify if all the xml files are needed from

deploy/common/clickhouse

Lucas Thompson

07/15/2025, 2:57 PM

ah interesting now trying localhost the otel-collector logs are now saying

connection refused

Copy code

{
  "level": "fatal",
  "timestamp": "2025-07-15T14:53:41.592Z",
  "caller": "signozotelcollector/main.go:79",
  "msg": "failed to run service:",
  "error": "failed to start collector service: failed to start : failed to start with noop config: collector failed to restart: failed to build pipelines: failed to create \"clickhousetraces\" exporter for data type \"traces\": failed to create clickhouse client: dial tcp 127.0.0.1:9000: connect: connection refused",
  "stacktrace": "main.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozotelcollector/main.go:79\nruntime.main\n\t/opt/hostedtoolcache/go/1.23.10/x64/src/runtime/proc.go:272"
}

Lucas Thompson

07/15/2025, 2:57 PM

the main one i still don't understand the above opamp log

Lucas Thompson

07/15/2025, 3:11 PM

@Vibhu Pandey any thoughts on the opamp issues still?

Lucas Thompson

07/16/2025, 12:49 PM

@Vibhu Pandey you around for a few mins to troubleshoot?

Lucas Thompson

07/16/2025, 12:50 PM

schema logs

Lucas Thompson

07/16/2025, 12:50 PM

mainly now i'm not understanding why the schema migrator isn't creating the db's

Lucas Thompson

07/16/2025, 3:58 PM

i think this is the key

Copy code

code: 701, message: Requested cluster 'cluster' not found

for some reason schema migrator is having trouble recognizing zookeeper cluster?

8 Views

Open in Slack

Previous Next