Has anyone succesfully implemented signoz communit...
# support
l
Has anyone succesfully implemented signoz community edition running on ecs fargate? was running on ec2, but trying to migrate, and going okay, but looking for tips for config. Don't want to do ec2 ecs
v
Does this work for you? https://signoz.io/docs/install/ecs/
l
that is what I am trying to work off of, and trying to adapt it to ECS fargate, since my Terraform code can already do that and I don't want to write new code for doing it ecs EC2 style my current issues seem to be figuring out how to get the clickhouse container running so the signoz container is happy see example logs from signoz container below
Copy code
July 02, 2025 at 14:49 (UTC-4:00)
{"level":"fatal","timestamp":"2025-07-02T18:49:13.480Z","caller":"query-service/main.go:144","msg":"Failed to create signoz","error":"unable to open database file: no such file or directory","stacktrace":"main.main\n\t/home/runner/work/signoz/signoz/pkg/query-service/main.go:144\nruntime.main\n\t/opt/hostedtoolcache/go/1.23.10/x64/src/runtime/proc.go:272"}
signoz-container
July 02, 2025 at 14:49 (UTC-4:00)
{"timestamp":"2025-07-02T18:49:13.47915072Z","level":"INFO","code":{"function":"<http://github.com/SigNoz/signoz/pkg/signoz.New|github.com/SigNoz/signoz/pkg/signoz.New>","file":"/home/runner/work/signoz/signoz/pkg/signoz/signoz.go","line":73},"msg":"starting signoz","version":"","variant":"community","commit":"","branch":"","go":"go1.23.10","timestamp":""}
signoz-container
July 02, 2025 at 14:49 (UTC-4:00)
{"timestamp":"2025-07-02T18:49:13.479337464Z","level":"INFO","code":{"function":"<http://github.com/SigNoz/signoz/pkg/sqlstore/sqlitesqlstore.New|github.com/SigNoz/signoz/pkg/sqlstore/sqlitesqlstore.New>","file":"/home/runner/work/signoz/signoz/pkg/sqlstore/sqlitesqlstore/provider.go","line":44},"msg":"connected to sqlite","logger":"<http://github.com/SigNoz/signoz/pkg/sqlitesqlstore|github.com/SigNoz/signoz/pkg/sqlitesqlstore>","path":"/var/lib/signoz/signoz.db"}
signoz-container
July 02, 2025 at 14:49 (UTC-4:00)
{"timestamp":"2025-07-02T18:49:13.47941071Z","level":"ERROR","code":{"function":"<http://github.com/prometheus/prometheus/promql.NewActiveQueryTracker|github.com/prometheus/prometheus/promql.NewActiveQueryTracker>","file":"/home/runner/go/pkg/mod/github.com/prometheus/prometheus@v0.304.1/promql/query_logger.go","line":137},"msg":"Failed to create directory for logging active queries","logger":"<http://github.com/SigNoz/signoz/pkg/prometheus/clickhouseprometheus|github.com/SigNoz/signoz/pkg/prometheus/clickhouseprometheus>"}
signoz-container
July 02, 2025 at 14:49 (UTC-4:00)
{"timestamp":"2025-07-02T18:49:13.479920498Z","level":"INFO","code":{"function":"<http://github.com/SigNoz/signoz/pkg/sqlmigrator.(*migrator).Migrate|github.com/SigNoz/signoz/pkg/sqlmigrator.(*migrator).Migrate>","file":"/home/runner/work/signoz/signoz/pkg/sqlmigrator/migrator.go","line":43},"msg":"starting sqlstore migrations","logger":"<http://github.com/SigNoz/signoz/pkg/sqlmigrator|github.com/SigNoz/signoz/pkg/sqlmigrator>","dialect":"sqlite"}
signoz-container
July 02, 2025 at 14:49 (UTC-4:00)
[Deprecated] flag --config is deprecated for passing prometheus config. The flag will be used for passing the entire SigNoz config. More details can be found at <https://github.com/SigNoz/signoz/issues/6805>.
signoz-container
July 02, 2025 at 14:49 (UTC-4:00)
[Deprecated] flag --flux-interval is deprecated and scheduled for removal. Please use SIGNOZ_QUERIER_FLUX__INTERVAL instead.
signoz-container
July 02, 2025 at 14:49 (UTC-4:00)
[Deprecated] flag --flux-interval-for-trace-detail is deprecated and scheduled for complete removal. Please use SIGNOZ_QUERIER_FLUX__INTERVAL instead.
signoz-container
July 02, 2025 at 14:49 (UTC-4:00)
[Deprecated] flag --cluster is deprecated and scheduled for removal. Please use SIGNOZ_TELEMETRYSTORE_CLICKHOUSE_CLUSTER instead.
I think I'm pretty close, just need some guidance
v
Does this exist?
Copy code
/var/lib/signoz/signoz.db
It looks like you are trying to read the sqlite file from this
Copy code
/var/lib/signoz/signoz.db
l
im not sure yet, I am trying to use an EFS Volume for all the containers. So i am deploying it on a cluster and on a single ECS service. and I can get the service and task definition to create with my Terraform, but I'm having trouble getting the Config to work together via the EFS volume and pointing all the mountpoints at it
i have all the container definitions basics setup, including the dependencies, but it's hard to tell where the main failure point is at. The above I posted is the most information, so I think I'm not understanding something correctly about the clickhouse portion
v
Copy code
{
  "level": "fatal",
  "timestamp": "2025-07-02T18:49:13.480Z",
  "caller": "query-service/main.go:144",
  "msg": "Failed to create signoz",
  "error": "unable to open database file: no such file or directory",
  "stacktrace": "main.main\n\t/home/runner/work/signoz/signoz/pkg/query-service/main.go:144\nruntime.main\n\t/opt/hostedtoolcache/go/1.23.10/x64/src/runtime/proc.go:272"
}
This tells me signoz is not able to open sqlite. Looks like a sqlite error and not a clickhouse error. Try setting
SIGNOZ_SQLSTORE_SQLITE_PATH
to something inside your EFS to make it work. Eg: If EFS is mounted at
/mnt
, try specifying the env variable to
/mnt/signoz.db
l
ok thank you will try something like that shortly
okay tried something like that and can see in the logs that it looks a little happier, but still basically bombing out, that didn't seem to be the key thing. where else to check?
Copy code
July 03, 2025 at 10:58 (UTC-4:00)
{"timestamp":"2025-07-03T14:58:28.386477134Z","level":"INFO","code":{"function":"<http://github.com/SigNoz/signoz/pkg/signoz.New|github.com/SigNoz/signoz/pkg/signoz.New>","file":"/home/runner/work/signoz/signoz/pkg/signoz/signoz.go","line":73},"msg":"starting signoz","version":"","variant":"community","commit":"","branch":"","go":"go1.23.10","timestamp":""}
signoz-container
July 03, 2025 at 10:58 (UTC-4:00)
{"timestamp":"2025-07-03T14:58:28.386680605Z","level":"INFO","code":{"function":"<http://github.com/SigNoz/signoz/pkg/sqlstore/sqlitesqlstore.New|github.com/SigNoz/signoz/pkg/sqlstore/sqlitesqlstore.New>","file":"/home/runner/work/signoz/signoz/pkg/sqlstore/sqlitesqlstore/provider.go","line":44},"msg":"connected to sqlite","logger":"<http://github.com/SigNoz/signoz/pkg/sqlitesqlstore|github.com/SigNoz/signoz/pkg/sqlitesqlstore>","path":"/mnt/efs/signoz.db"}
signoz-container
July 03, 2025 at 10:58 (UTC-4:00)
{"timestamp":"2025-07-03T14:58:28.386799713Z","level":"ERROR","code":{"function":"<http://github.com/prometheus/prometheus/promql.NewActiveQueryTracker|github.com/prometheus/prometheus/promql.NewActiveQueryTracker>","file":"/home/runner/go/pkg/mod/github.com/prometheus/prometheus@v0.304.1/promql/query_logger.go","line":137},"msg":"Failed to create directory for logging active queries","logger":"<http://github.com/SigNoz/signoz/pkg/prometheus/clickhouseprometheus|github.com/SigNoz/signoz/pkg/prometheus/clickhouseprometheus>"}
signoz-container
July 03, 2025 at 10:58 (UTC-4:00)
{"timestamp":"2025-07-03T14:58:28.38755219Z","level":"INFO","code":{"function":"<http://github.com/SigNoz/signoz/pkg/sqlmigrator.(*migrator).Migrate|github.com/SigNoz/signoz/pkg/sqlmigrator.(*migrator).Migrate>","file":"/home/runner/work/signoz/signoz/pkg/sqlmigrator/migrator.go","line":43},"msg":"starting sqlstore migrations","logger":"<http://github.com/SigNoz/signoz/pkg/sqlmigrator|github.com/SigNoz/signoz/pkg/sqlmigrator>","dialect":"sqlite"}
signoz-container
July 03, 2025 at 10:58 (UTC-4:00)
{"level":"fatal","timestamp":"2025-07-03T14:58:28.387Z","caller":"query-service/main.go:144","msg":"Failed to create signoz","error":"unable to open database file: no such file or directory","stacktrace":"main.main\n\t/home/runner/work/signoz/signoz/pkg/query-service/main.go:144\nruntime.main\n\t/opt/hostedtoolcache/go/1.23.10/x64/src/runtime/proc.go:272"}
@Vibhu Pandey resurrecting this thread somewhat, made significant progress, just need a little more guidance. I think i have the config, zookeeper, and clickhouse containers workable, now trying to add in the signoz container itself and a few questions 1. The AWS Cont Definition environment variables I'm passing from the task definition don't seem to be picked up by signoz. I saw these types of logs initially, and I added them to the task def, but afterwards still seeing those logs, suggesting that there is a default config somewhere I'm missing and they aren't getting picked up/overridden. So I need help understanding why a. See here logs
Copy code
[Deprecated] flag --config is deprecated for passing prometheus config. The flag will be used for passing the entire SigNoz config. More details can be found at <https://github.com/SigNoz/signoz/issues/6805>.
signoz-container
July 10, 2025 at 13:32 (UTC-4:00)
[Deprecated] flag --flux-interval is deprecated and scheduled for removal. Please use SIGNOZ_QUERIER_FLUX__INTERVAL instead.
signoz-container
July 10, 2025 at 13:32 (UTC-4:00)
[Deprecated] flag --flux-interval-for-trace-detail is deprecated and scheduled for complete removal. Please use SIGNOZ_QUERIER_FLUX__INTERVAL instead.
signoz-container
July 10, 2025 at 13:32 (UTC-4:00)
[Deprecated] flag --cluster is deprecated and scheduled for removal. Please use SIGNOZ_TELEMETRYSTORE_CLICKHOUSE_CLUSTER instead.
a. and here's my config
Copy code
environment = [
    { 
      "name": "SIGNOZ_ALERTMANAGER_PROVIDER",
      "value": "signoz" 
    },
    { 
      "name": "SIGNOZ_TELEMETRYSTORE_CLICKHOUSE_DSN",
      "value": "<tcp://clickhouse:9000>" 
    },
    { 
      "name": "SIGNOZ_TELEMETRYSTORE_CLICKHOUSE_CLUSTER",
      "value": "cluster" 
    },
    { 
      "name": "SIGNOZ_SQLSTORE_SQLITE_PATH",         
      "value": "/var/lib/signoz/signoz.db" 
    },
    { 
      "name": "SIGNOZ_TELEMETRYSTORE_PROVIDER",                             
      "value": "clickhouse" 
    },
    { 
      "name": "SIGNOZ_ANALYTICS_ENABLED",                   
      "value": "true" 
    },
    {
      "name": "SIGNOZ_QUERIER_FLUX__INTERVAL",
      "value": "5m"
    },
  ]
2. Secondly Im not sure how to resolve the error regarding the
Failed to create directory for logging active queries
, im guessing it leads into the other error regarding
Failed to create signoz, unable to open database file
but one thing at a time
Copy code
{"level":"fatal","timestamp":"2025-07-10T17:32:17.350Z","caller":"query-service/main.go:144","msg":"Failed to create signoz","error":"unable to open database file: no such file or directory","stacktrace":"main.main\n\t/home/runner/work/signoz/signoz/pkg/query-service/main.go:144\nruntime.main\n\t/opt/hostedtoolcache/go/1.23.10/x64/src/runtime/proc.go:272"}
signoz-container
July 10, 2025 at 13:32 (UTC-4:00)
{"timestamp":"2025-07-10T17:32:17.348940656Z","level":"INFO","code":{"function":"<http://github.com/SigNoz/signoz/pkg/signoz.New|github.com/SigNoz/signoz/pkg/signoz.New>","file":"/home/runner/work/signoz/signoz/pkg/signoz/signoz.go","line":73},"msg":"starting signoz","version":"","variant":"community","commit":"","branch":"","go":"go1.23.10","timestamp":""}
signoz-container
July 10, 2025 at 13:32 (UTC-4:00)
{"timestamp":"2025-07-10T17:32:17.34928235Z","level":"INFO","code":{"function":"<http://github.com/SigNoz/signoz/pkg/sqlstore/sqlitesqlstore.New|github.com/SigNoz/signoz/pkg/sqlstore/sqlitesqlstore.New>","file":"/home/runner/work/signoz/signoz/pkg/sqlstore/sqlitesqlstore/provider.go","line":44},"msg":"connected to sqlite","logger":"<http://github.com/SigNoz/signoz/pkg/sqlitesqlstore|github.com/SigNoz/signoz/pkg/sqlitesqlstore>","path":"/var/lib/signoz/signoz.db"}
signoz-container
July 10, 2025 at 13:32 (UTC-4:00)
{"timestamp":"2025-07-10T17:32:17.3494089Z","level":"ERROR","code":{"function":"<http://github.com/prometheus/prometheus/promql.NewActiveQueryTracker|github.com/prometheus/prometheus/promql.NewActiveQueryTracker>","file":"/home/runner/go/pkg/mod/github.com/prometheus/prometheus@v0.304.1/promql/query_logger.go","line":137},"msg":"Failed to create directory for logging active queries","logger":"<http://github.com/SigNoz/signoz/pkg/prometheus/clickhouseprometheus|github.com/SigNoz/signoz/pkg/prometheus/clickhouseprometheus>"}
would much appreciate some guidance on this. I'm very excited because I believe I am very close, just need some help getting over the finish line. @Nagesh Bansal and maybe @Srikanth Chekuri as well apologies for the multiple tags, I just can't contain my excitement Thanks!
v
Copy code
Set SIGNOZ_PROMETHEUS_ACTIVE__QUERY__TRACKER_ENABLED to false
This will get rid of the
Failed to create directory for logging active queries
Copy code
{ 
      "name": "SIGNOZ_SQLSTORE_SQLITE_PATH",         
      "value": "/var/lib/signoz/signoz.db" 
    },
Help me understand this. Does this path exist on your underlying volume?
l
yes it does, I co-opted the config fetcher container, and since I am using an EFS volume, am handling whatever pre-setup I need to from there before clickhouse,signoz and the rest of the containers come up. So i'm creating that path initially. I will try as you suggest, but my concern is that It won't work (see my first point above) where I am setting these vars, but am still seeing logs in the signoz container to use the new ENV var conventions (which I am) so the old ones must be getting used somewhere
v
Can you send your entire task definition? Especially the
args
l
yes one moment
just the signoz container, or all of it?
v
let's start with signoz container
l
👍
im creating the task/container definitions with a terraform module. It is mostly 1v1 comparison and easily readable, is that ok? or do you want traditional json?
signoz cont task def
and I did try recreating it with the SIGNOZ_PROMETHEUS_ACTIVE__QUERY__TRACKER_ENABLED set to false and no change, its not picking up those env vars from the ecs task def for some reason, not sure what I'm doing wrongly
v
You can get rid of this: it has been deprecated.
Copy code
command = ["--config=/root/config/prometheus.yml"]
Rest of the warnings can be ignored, not that big an issue
l
okay I was wondering about that, are the ecs docs just too old?
v
Copy code
{"function":"<http://github.com/SigNoz/signoz/pkg/sqlstore/sqlitesqlstore.New|github.com/SigNoz/signoz/pkg/sqlstore/sqlitesqlstore.New>","file":"/home/runner/work/signoz/signoz/pkg/sqlstore/sqlitesqlstore/provider.go","line":44},"msg":"connected to sqlite","logger":"<http://github.com/SigNoz/signoz/pkg/sqlitesqlstore|github.com/SigNoz/signoz/pkg/sqlitesqlstore>","path":"/var/lib/signoz/signoz.db"}
This indicates that
SIGNOZ_SQLSTORE_SQLITE_PATH
is being picked up correctly. But when signoz tries to do something, it realizes nothing exists at
/var/lib/signoz/signoz.db
.
You need to make sure that
/var/lib/signoz
is present and the container has permissions to create a
signoz.db
file.
I see you are mounting it at
/signoz-setup/var/lib/signoz
Doesnt it make sense to specify SIGNOZ_SQLSTORE_SQLITE_PATH=
/signoz-setup/var/lib/signoz/signoz.db
?
l
good call out, I've been trying a lot of things to get this working, and must have forgotten about that. Will try and update shortly and I can confirm that the path is present on the EFS volume mount, if this doesn't work will look into if its a permissions issue.
v
Let me know. Rooting for you 🔥
l
ty for your help im very excited to feel so close 🤞
🙌 1
more logs
definitely made a difference, see logs
v
Nicee now 8080 seems to be occupied
l
hmmm, i've been pretty careful to follow patterns of https://signoz.io/docs/install/ecs/
so the other containers in the task def I have right now aren't overriding it somehow...
i think my ALB target group health check is set to that currently, could that be the issue?
v
Zookeeper admin listens on 8080....
l
image.png
v
Ahhhhh all of them are running as sidecars?
l
i think i am using the zookeeper healtcheck path from the ALB right now to keep it alive for the moment...
Below is a single, all-in-one ECS task definition JSON that includes every SigNoz component in one task:
v
Yup and I'll have to apologize for that. It seems the docs might not be correct. Here is what needs to be changed in zookeeper:
Copy code
{
  "name": "zookeeper-1",
  "image": "bitnami/zookeeper:3.7.1",
  "cpu": 512,
  "memory": 512,
  "memoryReservation": 512,
  "essential": true,
  "portMappings": [
    {
      "containerPort": 2181,
      "hostPort": 2181,
      "protocol": "tcp"
    },
    {
      "containerPort": 2888,
      "hostPort": 2888,
      "protocol": "tcp"
    },
    {
      "containerPort": 3888,
      "hostPort": 3888,
      "protocol": "tcp"
    },
    {
      "containerPort": 9141,
      "hostPort": 9141,
      "protocol": "tcp"
    }
  ],
  "environment": [
    {
      "name": "ALLOW_ANONYMOUS_LOGIN",
      "value": "yes"
    },
    {
      "name": "ZOO_SERVER_ID",
      "value": "1"
    },
    {
      "name": "ZOO_ENABLE_PROMETHEUS_METRICS",
      "value": "yes"
    },
    {
      "name": "ZOO_AUTOPURGE_INTERVAL",
      "value": "1"
    },
    {
      "name": "ZOO_PROMETHEUS_METRICS_PORT_NUMBER",
      "value": "9141"
    },
    {
      "name": "ZOO_ADMIN_SERVER_PORT_NUMBER",
      "value": "3181"
    }
  ],
  "healthCheck": {
    "command": [
      "CMD-SHELL",
      "curl -s -m 2 <http://localhost:3181/commands/ruok> | grep error | grep null"
    ],
    "interval": 30,
    "timeout": 5,
    "retries": 3,
    "startPeriod": 30
  },
  "logConfiguration": {
    "logDriver": "awslogs",
    "options": {
      "awslogs-group": "/aws/ecs/<LOG_GROUP>",
      "awslogs-region": "<AWS_REGION>",
      "awslogs-stream-prefix": "zookeeper"
    }
  }
}
2 changes: 1. 1 env variable added:
{ "name": "ZOO_ADMIN_SERVER_PORT_NUMBER","value":"3181"}
2. healthcheck modified:
["CMD-SHELL","curl -s -m 2 <http://localhost:3181/commands/ruok> | grep error | grep null"]
l
okay will try that real quick
think we are getting much closer
v
Yes Yes everything should come up 💪
l
hooray!
oh heck yeah we are looking decent now!
v
Are you able to open the UI?
l
checking now
no luck yet, but its likely due to my own infra network settings, and I just have to make sure those are all correct, will just take me a few mins to remember all the things to check, but I'm confident I can do that, and get the rest of the way there. The task is staying up and all the containers are staying up and healthy too, so overall feeling very good right now!
v
Let me know 🎉
n
Hey @Lucas Thompson, Thanks for trying to run Signoz on ECS, gained some valuable insights from the conversation and will ensure that we incorporate the changes.
l
yeah when I'm done I'm willing to connect and share more of what I learned, things i had to change
🙌 1
final question, what do i need to do to ensure i can connect to the frontend/UI via HTTPS? I have a domain name r53 record already, and I think the networking security groups should be all fine, but I think Signoz needs to know somehow?
not sure what the equivalent
values.yaml
is in my situation as opposed to the k8s docshttps://signoz.io/docs/tutorial/setting-up-tls-for-signoz/
v
You can do TLS termination at your ALB Lucas.
Just add a certificate at the ALB layer
l
i do have a cert on the alb, it is a working wildcard cert pattern that works with other ecs --> target groups setups in our env
and the r53 record should fit under that, so I don't think its that
I think I'm good now, might have been DNS just having to figure it out, but I can reach the UI!
🙌 1
just a few more questions can i get some guidance on where the clickhouse host is setting itself? i can't seem to track down where the
10.1.0.2:53
is coming from in this otel collector error log
Copy code
{
  "level": "error",
  "timestamp": "2025-07-14T17:23:17.974Z",
  "caller": "opamp/server_client.go:143",
  "msg": "Failed to connect to the server: %v",
  "component": "opamp-server-client",
  "error": "dial tcp: lookup signoz on 10.1.0.2:53: no such host",
  "stacktrace": "github.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).Start.func2\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/server_client.go:143\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).tryConnectOnce.func1\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/wsclient.go:232\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).tryConnectOnce\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/wsclient.go:253\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).ensureConnected\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/wsclient.go:282\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runOneCycle\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/wsclient.go:326\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runUntilStopped\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/wsclient.go:412\ngithub.com/open-telemetry/opamp-go/client/internal.(*ClientCommon).StartConnectAndRun.func1\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/internal/clientcommon.go:208"
}
for context here is my current otel collector config
Copy code
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  awscloudwatch:
    region: us-east-1
    imds_endpoint: <http://169.254.169.254/latest/>
    logs:
      poll_interval: 1m
      groups:
        named:
          /ecs/kh-ecs:
          /ecs/keycloak:
  prometheus:
    config:
      global:
        scrape_interval: 60s
      scrape_configs:
        - job_name: otel-collector
          static_configs:
          - targets:
              - localhost:8888
            labels:
              job_name: otel-collector
  hostmetrics:
    collection_interval: 30s  # Frequency of metrics collection.
    scrapers:
      cpu: {}
      load:
        cpu_average: false
      memory: {}
      disk: {}
      filesystem: {}
      network: {}
      paging: {}
      processes: {}
  syslog:
    tcp:
      listen_address: "0.0.0.0:54527"
    protocol: rfc3164
    location: UTC
    operators:
      - type: move
        from: attributes.message
        to: body
processors:
  batch:
    send_batch_size: 10000
    send_batch_max_size: 11000
    timeout: 10s
  resourcedetection:
    # Using OTEL_RESOURCE_ATTRIBUTES envvar, env detector adds custom labels.
    detectors: [env, system, ec2]
    system:
      hostname_sources: [os]
    timeout: 2s
  resource/env:
    attributes:
    - key: deployment.environment
      value: develop
      action: upsert
  signozspanmetrics/delta:
    metrics_exporter: clickhousemetricswrite, signozclickhousemetrics
    metrics_flush_interval: 60s
    latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s ]
    dimensions_cache_size: 100000
    aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA
    enable_exp_histogram: true
    dimensions:
      - name: service.namespace
        default: default
      - name: deployment.environment
        default: default
      # This is added to ensure the uniqueness of the timeseries
      # Otherwise, identical timeseries produced by multiple replicas of
      # collectors result in incorrect APM metrics
      - name: signoz.collector.id
      - name: service.version
      - name: browser.platform
      - name: browser.mobile
      - name: k8s.cluster.name
      - name: k8s.node.name
      - name: k8s.namespace.name
      - name: host.name
      - name: host.type
      - name: container.name
extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  pprof:
    endpoint: 0.0.0.0:1777
exporters:
  clickhousetraces:
    datasource: <tcp://clickhouse:9000/signoz_traces>
    low_cardinal_exception_grouping: ${env:LOW_CARDINAL_EXCEPTION_GROUPING}
    use_new_schema: true
  clickhousemetricswrite:
    endpoint: <tcp://clickhouse:9000/signoz_metrics>
    disable_v2: true
    resource_to_telemetry_conversion:
      enabled: true
  clickhousemetricswrite/prometheus:
    endpoint: <tcp://clickhouse:9000/signoz_metrics>
    disable_v2: true
  signozclickhousemetrics:
    dsn: <tcp://clickhouse:9000/signoz_metrics>
  clickhouselogsexporter:
    dsn: <tcp://clickhouse:9000/signoz_logs>
    timeout: 10s
    use_new_schema: true
  # debug: {}
  otlp:
    endpoint: "127.0.0.1:4317"   # Your SigNoz collector endpoint.
    tls:
      insecure: true
service:
  telemetry:
    logs:
      encoding: json
  extensions:
    - health_check
    - pprof
  pipelines:
    traces:
      receivers: [otlp]
      processors: [signozspanmetrics/delta, batch]
      exporters: [clickhousetraces]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [clickhousemetricswrite, signozclickhousemetrics]
    metrics/prometheus:
      receivers: [prometheus]
      processors: [batch]
      exporters: [clickhousemetricswrite/prometheus, signozclickhousemetrics]
    metrics/hostmetrics:
      receivers: [hostmetrics]
      processors: [resourcedetection, resource/env]
      exporters: [otlp]
    logs:
      receivers: [otlp,awscloudwatch, syslog]
      processors: [batch]
      exporters: [clickhouselogsexporter]
@Vibhu Pandey given that you stand to gain from some of the stuff I have figured out with signoz on ecs fargate, would one of your support team be willing to meet with me today to get this over the finish line? struggling to allow get the otel-collector talking to the clickhouse container still
v
This is AWS VPC's DNS resolver address (
10.1.0.2:53
). it says collector is unable to find the signoz container. There must be another config map in the collector (something that has opamp in it)
l
also seeing logs like this too, is this because of the first error above? collector can't find signoz, so it can't create the clickhouse db's?
Copy code
{
  "level": "error",
  "ts": "2025-07-15T14:02:45.887Z",
  "caller": "service@v0.128.0/service.go:189",
  "msg": "error found during service initialization",
  "resource": {
    "service.instance.id": "fb079bc5-0001-40e9-885f-f039f1cd3c73",
    "service.name": "/signoz-otel-collector",
    "service.version": "dev"
  },
  "error": "failed to build pipelines: failed to create \"clickhouselogsexporter\" exporter for data type \"logs\": cannot configure clickhouse logs exporter: code: 81, message: Database signoz_logs does not exist",
  "stacktrace": "<http://go.opentelemetry.io/collector/service.New.func1|go.opentelemetry.io/collector/service.New.func1>\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/service@v0.128.0/service.go:189\ngo.opentelemetry.io/collector/service.New\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/service@v0.128.0/service.go:220\ngo.opentelemetry.io/collector/otelcol.(*Collector).setupConfigurationComponents\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/otelcol@v0.128.0/collector.go:197\ngo.opentelemetry.io/collector/otelcol.(*Collector).Run\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/otelcol@v0.128.0/collector.go:312\ngithub.com/SigNoz/signoz-otel-collector/signozcol.(*WrappedCollector).Run.func1\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/signozcol/collector.go:103"
}
also here is a log from the signoz container saying it can't reach clickhouse
Copy code
{
  "level": "error",
  "timestamp": "2025-07-15T14:02:16.171Z",
  "caller": "app/server.go:175",
  "msg": "failed to preload metrics metadata",
  "error": "dial tcp: lookup clickhouse on 10.1.0.2:53: no such host",
  "stacktrace": "<http://github.com/SigNoz/signoz/pkg/query-service/app.NewServer|github.com/SigNoz/signoz/pkg/query-service/app.NewServer>\n\t/home/runner/work/signoz/signoz/pkg/query-service/app/server.go:175\nmain.main\n\t/home/runner/work/signoz/signoz/pkg/query-service/main.go:147\nruntime.main\n\t/opt/hostedtoolcache/go/1.23.10/x64/src/runtime/proc.go:272"
}
v
Right, I think if you replace
<tcp://clickhouse:9000>
with
<tcp://localhost:9000>
, things should work? Since they all run as sidecars?
l
I did actually try just that this morning
Copy code
exporters:
  clickhousetraces:
    datasource: <tcp://localhost:9000/signoz_traces>
    low_cardinal_exception_grouping: ${env:LOW_CARDINAL_EXCEPTION_GROUPING}
    use_new_schema: true
  clickhousemetricswrite:
    endpoint: <tcp://localhost:9000/signoz_metrics>
    disable_v2: true
    resource_to_telemetry_conversion:
      enabled: true
  clickhousemetricswrite/prometheus:
    endpoint: <tcp://localhost:9000/signoz_metrics>
    disable_v2: true
  signozclickhousemetrics:
    dsn: <tcp://localhost:9000/signoz_metrics>
  clickhouselogsexporter:
    dsn: <tcp://localhost:9000/signoz_logs>
    timeout: 10s
    use_new_schema: true
and those errors i just pasted above were from that attempt
v
Were you able to find the
opamp
file?
l
not sure I'm understanding, you referring to the
manager-config.yaml
? if so I am grabbing it and setting it up likethe flags on the otel-collector are looking for and it has this as its content
Copy code
server_endpoint: <ws://signoz:4320/v1/opamp>
v
Yup let's change this to localhost also
instead of signoz
l
okay i will try that
okay will try this?
Copy code
server_endpoint: <ws://localhost:4320/v1/opamp>
v
Yess
l
okay from the
signoz-container
now seeing a new error, but seems like progress
Copy code
{"level":"error","timestamp":"2025-07-15T14:29:58.841Z","caller":"opamp/opamp_server.go:117","msg":"Failed to find or create agent","agentID":"01980e7e-0da0-79c0-bd92-95e71836c201","error":"cannot create agent without orgId","errorVerbose":"cannot create agent without orgId\<http://ngithub.com/SigNoz/signoz/pkg/query-service/app/opamp/model.(*Agents).FindOrCreateAgent|ngithub.com/SigNoz/signoz/pkg/query-service/app/opamp/model.(*Agents).FindOrCreateAgent>\n\t/home/runner/work/signoz/signoz/pkg/query-service/app/opamp/model/agents.go:91\ngithub.com/SigNoz/signoz/pkg/query-service/app/opamp.(*Server).OnMessage\n\t/home/runner/work/signoz/signoz/pkg/query-service/app/opamp/opamp_server.go:115\ngithub.com/open-telemetry/opamp-go/server.(*server).handleWSConnection\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/server/serverimpl.go:253\nruntime.goexit\n\t/opt/hostedtoolcache/go/1.23.10/x64/src/runtime/asm_amd64.s:1700","stacktrace":"github.com/SigNoz/signoz/pkg/query-service/app/opamp.(*Server).OnMessage\n\t/home/runner/work/signoz/signoz/pkg/query-service/app/opamp/opamp_server.go:117\ngithub.com/open-telemetry/opamp-go/server.(*server).handleWSConnection\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/server/serverimpl.go:253"}
and on the otel-collector logs side seeing opamp log still
Copy code
{
  "level": "error",
  "timestamp": "2025-07-15T14:29:58.841Z",
  "caller": "opamp/server_client.go:146",
  "msg": "Server returned an error response: %v",
  "component": "opamp-server-client",
  "": "",
  "stacktrace": "<http://github.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).Start.func3|github.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).Start.func3>\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/server_client.go:146\ngithub.com/open-telemetry/opamp-go/client/internal.(*receivedProcessor).processErrorResponse\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/internal/receivedprocessor.go:247\ngithub.com/open-telemetry/opamp-go/client/internal.(*receivedProcessor).ProcessReceivedMessage\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/internal/receivedprocessor.go:170\ngithub.com/open-telemetry/opamp-go/client/internal.(*wsReceiver).ReceiverLoop\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.19.0/client/internal/wsreceiver.go:94"
}
as well as more clickhouse pipelines/exporter issues, but assuming we will get to those afterwards
although I am checking one thing out, while I am doing that, can you clarify if all the xml files are needed from
deploy/common/clickhouse
?
ah interesting now trying localhost the otel-collector logs are now saying
connection refused
Copy code
{
  "level": "fatal",
  "timestamp": "2025-07-15T14:53:41.592Z",
  "caller": "signozotelcollector/main.go:79",
  "msg": "failed to run service:",
  "error": "failed to start collector service: failed to start : failed to start with noop config: collector failed to restart: failed to build pipelines: failed to create \"clickhousetraces\" exporter for data type \"traces\": failed to create clickhouse client: dial tcp 127.0.0.1:9000: connect: connection refused",
  "stacktrace": "main.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozotelcollector/main.go:79\nruntime.main\n\t/opt/hostedtoolcache/go/1.23.10/x64/src/runtime/proc.go:272"
}
the main one i still don't understand the above opamp log
@Vibhu Pandey any thoughts on the opamp issues still?
@Vibhu Pandey you around for a few mins to troubleshoot?
schema logs
mainly now i'm not understanding why the schema migrator isn't creating the db's
i think this is the key
Copy code
code: 701, message: Requested cluster 'cluster' not found
for some reason schema migrator is having trouble recognizing zookeeper cluster?