Hi there new signoz user trying it out to replace some legac SigNoz Community #support

Hi there! new signoz user (trying it out to replac...

Ross

06/08/2025, 3:47 AM

Hi there! new signoz user (trying it out to replace some legacy apps)... currently have it up and running all reports ok... trying to deploy the docker-compose agent (self hosted).. the troubleshoot docker image runs and sends data to the host... but the infra monitoring view remains empty... also logs.. from the infra monitor on the host running signoz has halted.. any input or tips would be appreciated

Ross

06/08/2025, 4:11 AM

Copy code

otel-agent-1  | {"level":"info","ts":1749355727.6639392,"caller":"healthcheck/handler.go:132","msg":"Health Check state change","kind":"extension","name":"health_check","status":"ready"}
    otel-agent-1  | {"level":"info","ts":1749355727.6646318,"caller":"service@v0.111.0/service.go:234","msg":"Everything is ready. Begin running and processing data."}

Ross

06/08/2025, 4:11 AM

otel-agent looks good

Ross

06/08/2025, 4:11 AM

logspout seems to be having resolution errors... tho.. the other containers are fine

Copy code

logspout-1  | 2025/06/08 04:11:14 !! lookup otel-agent on 8.8.8.8:53: no such host
    logspout-1  | 2025/06/08 04:11:15 # logspout v3.2.14 by gliderlabs
    logspout-1  | 2025/06/08 04:11:15 # adapters: syslog tcp tls udp multiline raw
    logspout-1  | 2025/06/08 04:11:15 # options :
    logspout-1  | 2025/06/08 04:11:15 persist:/mnt/routes
    logspout-1  | 2025/06/08 04:11:15 !! lookup otel-agent on 8.8.8.8:53: no such host

Ross

06/08/2025, 5:14 AM

Copy code

time="2025-06-08T05:13:34Z" level=warning msg="The input device is not a TTY. The --tty and --interactive flags might not work properly"
    Trying to pull <http://docker.io/signoz/troubleshoot:latest|docker.io/signoz/troubleshoot:latest>...
    Getting image source signatures
    Copying blob sha256:a79e9ac51b1a52e399df997df5ed56dab64218d0af3858d8cee8265fdc483413
    Copying blob sha256:f2f4f8df211a98899c26216a7b2c05a2ca8d93efda78a9026588c8877cfd47d7
    Copying config sha256:3f1d3ca8bc659ab1ec95541fba528bf37a8ab7e662ed6b820bdfb8ca3045ca87
    Writing manifest to image destination
    2025-06-08T05:13:39.384Z    ?[34mINFO?[0m   troubleshoot/main.go:28 STARTING!
    2025-06-08T05:13:39.386Z    ?[34mINFO?[0m   checkEndpoint/checkEndpoint.go:41       checking reachability of SigNoz endpoint
    2025-06-08T05:13:39.406Z    ?[34mINFO?[0m   troubleshoot/main.go:46 Successfully sent sample data to signoz ...

Ross

06/08/2025, 5:20 AM

and yep have read and configured https://signoz.io/docs/userguide/hostmetrics/

Ross

06/08/2025, 5:47 AM

looks like i need to check my grok... need to enable hostmetrics ON SIGNOZ AND the infra

Ross

06/08/2025, 6:32 AM

nope that was not it

Ross

06/08/2025, 7:22 AM

no data is showing 😞

Ross

06/08/2025, 9:53 AM

Copy code

signoz-otel-collector  | {"level":"warn","ts":1749376327.0448449,"caller":"clickhousemetricsexporter/exporter.go:283","msg":"NaN detected in quantile value, skipping entire data point","kind":"exporter","data_type":"metrics","name":"clickhousemetricswrite","metric_name":"sync_process_time"}

Srikanth Chekuri

06/09/2025, 5:21 AM

Hi @Ross, can you share some details on how you are configuring to collect data?

Ross

06/09/2025, 5:25 AM

Hi Srikanth! thanks for getting back to me

Ross

06/09/2025, 5:26 AM

its pretty much the same as the deploy repo in github...

Srikanth Chekuri

06/09/2025, 5:27 AM

Can you share the collector config?

Ross

06/09/2025, 5:27 AM

infra collector deployed on same network hosts... sending data to signoz (i started it on 8090 not 8080 as the docker-compose from github) as i had a port conflict but from what i gather 4317 is the only port that matters for infra hsotmetrics?

Ross

06/09/2025, 5:27 AM

sure

Ross

06/09/2025, 5:28 AM

collector config deployed on remote hosts

Copy code

receivers:
  hostmetrics:
    collection_interval: 30s
    root_path: /hostfs
    scrapers:
      cpu: {}
      load: {}
      memory: {}
      disk: {}
      filesystem: {}
      network: {}
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  prometheus:
    config:
      global:
        scrape_interval: 60s
      scrape_configs:
        - job_name: otel-collector
          static_configs:
          - targets:
              - localhost:8888
            labels:
              job_name: otel-collector
        # For Docker daemon metrics to be scraped, it must be configured to expose
        # Prometheus metrics, as documented here: <https://docs.docker.com/config/daemon/prometheus/>
        # - job_name: docker-daemon
        #   static_configs:
        #   - targets:
        #       - host.docker.internal:9323
        #     labels:
        #       job_name: docker-daemon
        - job_name: docker-container
          docker_sd_configs:
            - host: unix:///var/run/docker.sock
          relabel_configs:
            - action: keep
              regex: true
              source_labels:
                - __meta_docker_container_label_signoz_io_scrape
            - regex: true
              source_labels:
                - __meta_docker_container_label_signoz_io_path
              target_label: __metrics_path__
            - regex: (.+)
              source_labels:
                - __meta_docker_container_label_signoz_io_path
              target_label: __metrics_path__
            - separator: ":"
              source_labels:
                - __meta_docker_network_ip
                - __meta_docker_container_label_signoz_io_port
              target_label: __address__
            - regex: '/(.*)'
              replacement: '$1'
              source_labels:
                - __meta_docker_container_name
              target_label: container_name
            - regex: __meta_docker_container_label_signoz_io_(.+)
              action: labelmap
              replacement: $1
  tcplog/docker:
    listen_address: "0.0.0.0:2255"
    operators:
      - type: regex_parser
        regex: '^<([0-9]+)>[0-9]+ (?P<timestamp>[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?) (?P<container_id>\S+) (?P<container_name>\S+) [0-9]+ - -( (?P<body>.*))?'
        timestamp:
          parse_from: attributes.timestamp
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'
      - type: move
        from: attributes["body"]
        to: body
      - type: remove
        field: attributes.timestamp
      # please remove names from below if you want to collect logs from them
      - type: filter
        id: signoz_logs_filter
        expr: 'attributes.container_name matches "^signoz|(signoz-(|otel-collector|clickhouse|zookeeper))|(infra-(logspout|otel-agent)-.*)"'
processors:
  batch:
    send_batch_size: 10000
    send_batch_max_size: 11000
    timeout: 10s
  resourcedetection:
    # Using OTEL_RESOURCE_ATTRIBUTES envvar, env detector adds custom labels.
    detectors:
      # - ec2
      # - gcp
      # - azure
      - env
      - system
    system:
      hostname_sources: [os]
    timeout: 15s
extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  pprof:
    endpoint: 0.0.0.0:1777
exporters:
  otlp:
    endpoint: ${env:SIGNOZ_COLLECTOR_ENDPOINT}
    timeout: 30s
    tls:
      insecure: true
    # headers:
    #   signoz-access-token: ${env:SIGNOZ_ACCESS_TOKEN}
  # debug: {}
service:
  telemetry:
    logs:
      encoding: json
    metrics:
      address: 0.0.0.0:8888
  extensions:
    - health_check
    - pprof
  pipelines:
    traces:
      receivers: [otlp]
      processors: [resourcedetection, batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      processors: [resourcedetection, batch]
      exporters: [otlp]
    metrics/hostmetrics:
      receivers: [hostmetrics]
      processors: [resourcedetection, batch]
      exporters: [otlp]
    metrics/prometheus:
      receivers: [prometheus]
      processors: [resourcedetection, batch]
      exporters: [otlp]
    logs:
      receivers: [otlp, tcplog/docker]
      processors: [resourcedetection, batch]
      exporters: [otlp]

Ross

06/09/2025, 5:29 AM

Copy code

SIGNOZ_COLLECTOR_ENDPOINT=<http://192.168.110.25:4317>    # In case of external SigNoz or cloud, update the endpoint and access token
OTEL_RESOURCE_ATTRIBUTES=host.name=vhqkube02,os.type=linux  # Replace signoz-host with the actual hostname

env file for the environment

Srikanth Chekuri

06/09/2025, 5:30 AM

Do you see any error logs on this collector logs?

Ross

06/09/2025, 5:30 AM

that was one of the weird things the logs. are kinda spars...

Ross

06/09/2025, 5:31 AM

at the moment ive reset the port to 8080 and waiting for it to come up again.. but somethigns seems to have goen wrong with the signoz as clickhouse sits migrating which has taken 15 minutes so far..

Ross

06/09/2025, 5:31 AM

previously as per the logs up there even the troubleshoot connected fine.. but no hosts showed in hostmetrics

Ross

06/09/2025, 5:32 AM

this was the log from the infra agent https://signoz-community.slack.com/archives/C01HWQ1R0BC/p1749355862745779?thread_ts=1749354470.303669&cid=C01HWQ1R0BC

Ross

06/09/2025, 5:32 AM

but nothing else.

Srikanth Chekuri

06/09/2025, 5:33 AM

Hmm, can you add debug/logging exporter and see if the data is actually getting collected? The debug/logging exporter confirms if the data is arriving till exporter (and if so, the otlp exporter should send data to SigNoz as well, that would confirm whether or not there is any issues at the infra collector agent).

Ross

06/09/2025, 5:33 AM

one thing i notice is that you have to stop all otel-agents sending data in order for signoz to start correctly.. i have 5 agents sending data and if i leave them runnign while i recreate docker containers it breks things

Srikanth Chekuri

06/09/2025, 5:34 AM

Hmm, that doesn't seem right. Are they all running on same network host?

Ross

06/09/2025, 5:35 AM

nope remote hosts

Srikanth Chekuri

06/09/2025, 5:35 AM

Port 8080 - for accessing UI; 4317 - for sending otel data through grpc ; 4318 - for sending otel data through http. These are ports exposed by defualt

Ross

06/09/2025, 5:35 AM

on the same network tho.. firewall is open connections are possible can nc 192.168.110.25 4317 -vv success (or cloud)

Ross

06/09/2025, 5:36 AM

so in the setup i exposed on 8090 (extrnal -> 8080 internal) on the docker container

Ross

06/09/2025, 5:36 AM

does data get sent over 8080? or is it just ui therE?

Srikanth Chekuri

06/09/2025, 5:36 AM

It's just UI

Ross

06/09/2025, 5:36 AM

ok good.. then that shouldnt matter ill switch back to what was working before

Srikanth Chekuri

06/09/2025, 5:38 AM

Can you check the following 1. Logs of signoz-otel-collector where main SigNoz installation running 2. Logs of ClickHouse where main SigNoz installation running If there are no issues there then it means data is not being sent correctly

Ross

06/09/2025, 5:38 AM

ok ill get that now.. its a fresh install on docker

Ross

06/09/2025, 5:39 AM

Copy code

user@docker:/home/docker/signoz/docker# docker compose rm
>>>> Executing external compose provider "/usr/local/bin/docker-compose". Please refer to the documentation for details. <<<<

? Going to remove signoz-zookeeper-1, signoz-init-clickhouse, signoz-clickhouse, schema-migrator-sync, schema-migrator-async, signoz, signoz-otel-collector Yes
[+] Removing 7/7
 ✔ Container signoz-otel-collector   Removed                                                                                                                                              0.1s
 ✔ Container signoz-init-clickhouse  Removed                                                                                                                                              0.3s
 ✔ Container schema-migrator-sync    Removed                                                                                                                                              0.2s
 ✔ Container schema-migrator-async   Removed                                                                                                                                              0.4s
 ✔ Container signoz-zookeeper-1      Removed                                                                                                                                              0.4s
 ✔ Container signoz                  Removed                                                                                                                                              0.2s
 ✔ Container signoz-clickhouse       Removed                                                                                                                                              0.4s
user@docker:/home/docker/signoz/docker# docker network ls | grep signoz | awk '{print $2}' | xargs docker network rm -f
signoz-net
user@docker:/home/docker/signoz/docker# docker volume ls | grep signoz | awk '{print $2}' | xargs docker volume rm

Ross

06/09/2025, 5:39 AM

cleared like so

Ross

06/09/2025, 5:40 AM

and now we wait for zokeeper and clickhouse to get their things in order (this takes.. a very long time btw..) not sure why zookeeper was opted for considering most places are dropping it?

Srikanth Chekuri

06/09/2025, 5:42 AM

there are two options 1. zookeeper 2. clickhouse keeper opted zookeeper for it's maturity and vast eco-system where you can get things resolved. on the other hand clickhouse keeper it is relatively new and there is not much out there how to fix something if things go south.

Ross

06/09/2025, 5:43 AM

schema-migrator-sync is running

Ross

06/09/2025, 5:44 AM

yeah that is fair .. but its also a beast and ofttimes... a little... javary (slow, and unreliable) but all good. now just waiting for schema-migrator-sync

Ross

06/09/2025, 5:44 AM

Copy code

✔ Network signoz-net                Cr...                      0.0s
 ✔ Volume "signoz-clickhouse"        Created                    0.0s  ✔ Volume "signoz-sqlite"            Created                    0.0s
 ✔ Volume "signoz-zookeeper-1"       Created                    0.0s  ✔ Container signoz-init-clickhouse  Exited                     6.3s
 ✔ Container signoz-zookeeper-1      Healthy                   35.7s  ✔ Container signoz-clickhouse       Healthy                   66.2s
 ⠙ Container schema-migrator-sync    Waiting                  288.2s  ✔ Container schema-migrator-async   Created                    0.1s
 ✔ Container signoz                  Crea...                    0.2s  ✔ Container signoz-otel-collector   Created                    0.1s

Ross

06/09/2025, 5:44 AM

288sec migrations on an empty database ... seems .. weird

Ross

06/09/2025, 5:46 AM

ok that finished

Copy code

369.6s

Srikanth Chekuri

06/09/2025, 5:46 AM

correct, the schema-migrator-sync doesn't know whether it's running on empty database so it runs the migrations the same way everytime. there is some inefficiency in how it runs bootstrap

Ross

06/09/2025, 5:46 AM

yep.. thats normal.. most larger systems with lots of migrations do that.. might be worth consolidating them at some point (django does this quite well)

Ross

06/09/2025, 5:47 AM

ok were up and runing.. jsut setup a new admin user 1 sec

Ross

06/09/2025, 5:48 AM

ok were in

Ross

06/09/2025, 5:49 AM

so everythign is empty now to get infra monitoring going

Ross

06/09/2025, 5:49 AM

all of my nodes connect to the service on 4317 (using netcat says connection is good)

Ross

06/09/2025, 5:50 AM

Copy code

Successfully connected to 192.168.110.25 (192.168.110.25) on tcp port 4317

Ross

06/09/2025, 5:51 AM

running

Copy code

docker run -it --rm <http://docker.io/signoz/troubleshoot|docker.io/signoz/troubleshoot> checkEndpoint --endpoint=192.168.110.25:4317

Srikanth Chekuri

06/09/2025, 5:52 AM

Just trying to follow your env. Does the

192.168.110.25:4317

point to main signoz installation collector?

Ross

06/09/2025, 5:52 AM

Copy code

time="2025-06-09T05:51:38Z" level=warning msg="The input device is not a TTY. The --tty and --interactive flags might not work properly"
    2025-06-09T05:51:39.426Z    ?[34mINFO?[0m   troubleshoot/main.go:28 STARTING!
    2025-06-09T05:51:39.428Z    ?[34mINFO?[0m   checkEndpoint/checkEndpoint.go:41       checking reachability of SigNoz endpoint
    2025-06-09T05:51:39.445Z    ?[34mINFO?[0m   troubleshoot/main.go:46 Successfully sent sample data to signoz ...

all 5 nodes report success

Ross

06/09/2025, 5:53 AM

so 110.25 IS the main signoz server

Ross

06/09/2025, 5:53 AM

on 8080 with 4317 exposed

Ross

06/09/2025, 5:53 AM

and the others are nodes on the network connecting to that signoz server wnting to send hostmetrics

Srikanth Chekuri

06/09/2025, 5:54 AM

cool, however, i see you have the otlp receiver in the agent config as well. I wanted to understand about that. how are they run and what would be host:port for them?

Ross

06/09/2025, 5:54 AM

so atm the setup is off the shelf jsut as it is.. right now the only thing im trying to gather is the hsotmetrics from each node..

Ross

06/09/2025, 5:54 AM

would it be better just to have node-exporter (prometheus) and get signoz to scrape that?

Srikanth Chekuri

06/09/2025, 5:55 AM

No, just remove the otlp receiver from the agent config and just keep the config limited to

hostmetrics

and

tcplog/docker

Ross

06/09/2025, 5:55 AM

ooo.. man i hope its that easy.. ok trying

Srikanth Chekuri

06/09/2025, 5:55 AM

Then export to main SigNoz using the otlp exporter like you are already doing

Ross

06/09/2025, 5:56 AM

so ill remove otlp reciever from agent config

Ross

06/09/2025, 5:58 AM

ok updated restarting agent

Ross

06/09/2025, 5:58 AM

have to update pipeliens too

Copy code

otel-agent-1  | Error: invalid configuration: service::pipelines::metrics: references receiver "otlp" which is not configured
otel-agent-1  | 2025/06/09 05:58:26 collector server run finished with error: invalid configuration: service::pipelines::metrics: references receiver "otlp" which is not configured

Ross

06/09/2025, 6:00 AM

i keep the otel exporters and just remove all except the hostmetrics pipeline?

Srikanth Chekuri

06/09/2025, 6:00 AM

right

Ross

06/09/2025, 6:01 AM

Copy code

otel-agent-1  | {"level":"warn","ts":1749448868.961559,"caller":"internal@v0.111.0/warning.go:40","msg":"Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks.","kind":"extension","name":"health_check","documentation":"<https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks>"}
otel-agent-1  | {"level":"info","ts":1749448868.96188,"caller":"extensions/extensions.go:59","msg":"Extension started.","kind":"extension","name":"health_check"}
otel-agent-1  | {"level":"info","ts":1749448868.9619875,"caller":"extensions/extensions.go:42","msg":"Extension is starting...","kind":"extension","name":"pprof"}
otel-agent-1  | {"level":"info","ts":1749448868.9621875,"caller":"pprofextension@v0.111.0/pprofextension.go:61","msg":"Starting net/http/pprof server","kind":"extension","name":"pprof","config":{"TCPAddr":{"Endpoint":"0.0.0.0:1777","DialerConfig":{"Timeout":0}},"BlockProfileFraction":0,"MutexProfileFraction":0,"SaveToFile":""}}
otel-agent-1  | {"level":"info","ts":1749448868.962536,"caller":"extensions/extensions.go:59","msg":"Extension started.","kind":"extension","name":"pprof"}
otel-agent-1  | {"level":"info","ts":1749448868.9636176,"caller":"internal/resourcedetection.go:125","msg":"began detecting resource information","kind":"processor","name":"resourcedetection","pipeline":"metrics/hostmetrics"}
otel-agent-1  | {"level":"info","ts":1749448868.9646227,"caller":"internal/resourcedetection.go:139","msg":"detected resource information","kind":"processor","name":"resourcedetection","pipeline":"metrics/hostmetrics","resource":{"host.name":"vhqkube05","os.type":"linux"}}
otel-agent-1  | {"level":"info","ts":1749448868.964854,"caller":"healthcheck/handler.go:132","msg":"Health Check state change","kind":"extension","name":"health_check","status":"ready"}
otel-agent-1  | {"level":"info","ts":1749448868.9649434,"caller":"service@v0.111.0/service.go:234","msg":"Everything is ready. Begin running and processing data."}

Ross

06/09/2025, 6:01 AM

looking positive

Ross

06/09/2025, 6:02 AM

Copy code

otel-agent-1  | {"level":"error","ts":1749448900.0632877,"caller":"scraperhelper/scrapercontroller.go:205","msg":"Error scraping metrics","kind":"receiver","name":"hostmetrics","data_type":"metrics","error":"failed to read usage at /hostfs/tmp/crun.VbBLPr: no such file or directory","scraper":"hostmetrics","stacktrace":"<http://go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport|go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport>\n\tgo.opentelemetry.io/collector/receiver@v0.111.0/scraperhelper/scrapercontroller.go:205\ngo.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1\n\tgo.opentelemetry.io/collector/receiver@v0.111.0/scraperhelper/scrapercontroller.go:181"}

Srikanth Chekuri

06/09/2025, 6:03 AM

Let it run for couple of minutes

Ross

06/09/2025, 6:04 AM

will the signoz infra monitoring ui update automatically or do i need to f5 refresh?

Srikanth Chekuri

06/09/2025, 6:04 AM

You need to refresh. That should appear after couple of minutes if everything is working

Ross

06/09/2025, 6:06 AM

logs have not updated since that error there ^^

Srikanth Chekuri

06/09/2025, 6:06 AM

Hmm do you see the host vhqkube05 in infra hosts list?

Ross

06/09/2025, 6:07 AM

nope

Srikanth Chekuri

06/09/2025, 6:07 AM

Can you share how did you mount the

/hostfs

Ross

06/09/2025, 6:07 AM

image.png

Ross

06/09/2025, 6:07 AM

Copy code

user: "0:0"

so thats one change i had to do

Ross

06/09/2025, 6:08 AM

and volumes

Copy code

volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
      - /:/hostfs:ro
      - /var/run/podman/podman.sock:/var/run/docker.sock

Ross

06/09/2025, 6:08 AM

which is out of the box (except for the podman aspect) yes using podman as a general shift

Srikanth Chekuri

06/09/2025, 6:12 AM

Now we have some better idea. The host metrics collection is not working. Not sure if podman has anything to do with it.

Ross

06/09/2025, 6:14 AM

most likeyl would ill try with docker 1 sec

Ross

06/09/2025, 6:15 AM

were trying to use podman over docker because docker is... well.. being docker

Ross

06/09/2025, 6:17 AM

thing is it has ro access to the fs.. and that process... should be the same (and podman is running as root saem as docker daemon)

Ross

06/09/2025, 6:20 AM

pulling images again with docker

Ross

06/09/2025, 6:21 AM

ok started image wiht docker-compose

Copy code

,"pipeline":"metrics/hostmetrics"}
otel-agent-1  | {"level":"info","ts":1749450058.7551575,"caller":"internal/resourcedetection.go:139","msg":"detected resource information","kind":"processor","name":"resourcedetection","pipeline":"metrics/hostmetrics","resource":{"host.name":"vhqkube05","os.type":"linux"}}
otel-agent-1  | {"level":"info","ts":1749450058.7555208,"caller":"healthcheck/handler.go:132","msg":"Health Check state change","kind":"extension","name":"health_check","status":"ready"}
otel-agent-1  | {"level":"info","ts":1749450058.7556674,"caller":"service@v0.111.0/service.go:234","msg":"Everything is ready. Begin running and processing data."}
otel-agent-1  | {"level":"error","ts":1749450059.7634532,"caller":"scraperhelper/scrapercontroller.go:205","msg":"Error scraping metrics","kind":"receiver","name":"hostmetrics","data_type":"metrics","error":"failed to read usage at /hostfs/tmp/crun.JhVuQz: no such file or directory","scraper":"hostmetrics","stacktrace":"<http://go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport|go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport>\n\tgo.opentelemetry.io/collector/receiver@v0.111.0/scraperhelper/scrapercontroller.go:205\ngo.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1\n\tgo.opentelemetry.io/collector/receiver@v0.111.0/scraperhelper/scrapercontroller.go:177"}

Ross

06/09/2025, 6:21 AM

same error

Srikanth Chekuri

06/09/2025, 6:22 AM

Which docker image are you using?

Ross

06/09/2025, 6:24 AM

Copy code

image: otel/opentelemetry-collector-contrib:0.111.0

whatever is in the default deploy example

Ross

06/09/2025, 6:24 AM

ok ive also updated the docker.sock mount.. and restarted

Ross

06/09/2025, 6:24 AM

not complaining this time...

Ross

06/09/2025, 6:25 AM

nothing showing in host metrics yet

Srikanth Chekuri

06/09/2025, 6:25 AM

collection interval is 30s let's wait for couple of collections to trigger

Ross

06/09/2025, 6:26 AM

Copy code

otel-agent-1  | {"level":"error","ts":1749450347.9352226,"caller":"scraperhelper/scrapercontroller.go:205","msg":"Error scraping metrics","kind":"receiver","name":"hostmetrics","data_type":"metrics","error":"failed to read usage at /hostfs/tmp/crun.dr8Cad: no such file or directory","scraper":"hostmetrics","stacktrace":"<http://go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport|go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport>\n\tgo.opentelemetry.io/collector/receiver@v0.111.0/scraperhelper/scrapercontroller.go:205\ngo.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1\n\tgo.opentelemetry.io/collector/receiver@v0.111.0/scraperhelper/scrapercontroller.go:181"}

Ross

06/09/2025, 6:26 AM

exactyl the same error

Srikanth Chekuri

06/09/2025, 6:27 AM

Can you change the volume to just

/:/hostfs

and see if it changes anything?

Ross

06/09/2025, 6:27 AM

:ro just means read only

Ross

06/09/2025, 6:27 AM

and is from the default exmple?

Ross

06/09/2025, 6:27 AM

but sure

Ross

06/09/2025, 6:28 AM

https://github.com/SigNoz/signoz/blob/main/deploy/docker/generator/infra/docker-compose.yaml#L20

Ross

06/09/2025, 6:28 AM

waiting 30se

Srikanth Chekuri

06/09/2025, 6:32 AM

Any change?

Ross

06/09/2025, 6:33 AM

nope..

Ross

06/09/2025, 6:34 AM

image.png

Ross

06/09/2025, 6:34 AM

otel collector config looks like

Copy code

receivers:
  hostmetrics:
    collection_interval: 30s
    root_path: /hostfs
    scrapers:
      cpu: {}
      load: {}
      memory: {}
      disk: {}
      filesystem: {}
      network: {}
  prometheus:
    config:
      global:
        scrape_interval: 60s
      scrape_configs:
        - job_name: otel-collector
          static_configs:
          - targets:
              - localhost:8888
            labels:
              job_name: otel-collector
        # For Docker daemon metrics to be scraped, it must be configured to expose
        # Prometheus metrics, as documented here: <https://docs.docker.com/config/daemon/prometheus/>
        # - job_name: docker-daemon
        #   static_configs:
        #   - targets:
        #       - host.docker.internal:9323
        #     labels:
        #       job_name: docker-daemon
        - job_name: docker-container
          docker_sd_configs:
            - host: unix:///var/run/docker.sock
          relabel_configs:
            - action: keep
              regex: true
              source_labels:
                - __meta_docker_container_label_signoz_io_scrape
            - regex: true
              source_labels:
                - __meta_docker_container_label_signoz_io_path
              target_label: __metrics_path__
            - regex: (.+)
              source_labels:
                - __meta_docker_container_label_signoz_io_path
              target_label: __metrics_path__
            - separator: ":"
              source_labels:
                - __meta_docker_network_ip
                - __meta_docker_container_label_signoz_io_port
              target_label: __address__
            - regex: '/(.*)'
              replacement: '$1'
              source_labels:
                - __meta_docker_container_name
              target_label: container_name
            - regex: __meta_docker_container_label_signoz_io_(.+)
              action: labelmap
              replacement: $1
  tcplog/docker:
    listen_address: "0.0.0.0:2255"
    operators:
      - type: regex_parser
        regex: '^<([0-9]+)>[0-9]+ (?P<timestamp>[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?) (?P<container_id>\S+) (?P<container_name>\S+) [0-9]+ - -( (?P<body>.*))?'
        timestamp:
          parse_from: attributes.timestamp
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'
      - type: move
        from: attributes["body"]
        to: body
      - type: remove
        field: attributes.timestamp
      # please remove names from below if you want to collect logs from them
      - type: filter
        id: signoz_logs_filter
        expr: 'attributes.container_name matches "^signoz|(signoz-(|otel-collector|clickhouse|zookeeper))|(infra-(logspout|otel-agent)-.*)"'
processors:
  batch:
    send_batch_size: 10000
    send_batch_max_size: 11000
    timeout: 10s
  resourcedetection:
    # Using OTEL_RESOURCE_ATTRIBUTES envvar, env detector adds custom labels.
    detectors:
      # - ec2
      # - gcp
      # - azure
      - env
      - system
    system:
      hostname_sources: [os]
    timeout: 15s
extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  pprof:
    endpoint: 0.0.0.0:1777
exporters:
  otlp:
    endpoint: ${env:SIGNOZ_COLLECTOR_ENDPOINT}
    timeout: 30s
    tls:
      insecure: true
    # headers:
    #   signoz-access-token: ${env:SIGNOZ_ACCESS_TOKEN}
  # debug: {}
service:
  telemetry:
    logs:
      encoding: json
    metrics:
      address: 0.0.0.0:8888
  extensions:
    - health_check
    - pprof
  pipelines:
    metrics/hostmetrics:
      receivers: [hostmetrics]
      processors: [resourcedetection, batch]
      exporters: [otlp]

Srikanth Chekuri

06/09/2025, 6:35 AM

Did it throw same error again after the change? I couldn't see the error in image above

Ross

06/09/2025, 6:35 AM

no error but no logs and nothing in infra section on the signoz server

Srikanth Chekuri

06/09/2025, 6:36 AM

available for short huddle?

Ross

06/09/2025, 6:36 AM

sure

Ross

06/09/2025, 6:37 AM

argh my browser does not support huddels

Ross

06/09/2025, 6:37 AM

on linux box

Srikanth Chekuri

06/09/2025, 6:37 AM

Srikanth Chekuri

06/09/2025, 6:37 AM

if you are not seeing any errors in agent logs the host lists should have the data.

Ross

06/09/2025, 6:38 AM

are there logs on signoz server i can check for incoming?

Srikanth Chekuri

06/09/2025, 6:38 AM

If there are no error logs on signoz-otel-collector or ClickHouse then it's getting ingested

Srikanth Chekuri

06/09/2025, 6:38 AM

Which version of SigNoz are you on?

Srikanth Chekuri

06/09/2025, 6:38 AM

You can go to metrics-explorer and check there which metrics are available

Srikanth Chekuri

06/09/2025, 6:39 AM

Screenshot 2025-06-09 at 12.09.16 PM.png

Ross

06/09/2025, 6:40 AM

sorry not sure what im looking for? i have to go to metrics to get the signoz version?

Ross

06/09/2025, 6:41 AM

Copy code

image: signoz/signoz:${VERSION:-v0.86.2}

Ross

06/09/2025, 6:41 AM

hvae not overriden the version.. so i guess 0.86.2

Srikanth Chekuri

06/09/2025, 6:41 AM

Ok, then check if you have system metrics in Metrics list page

Ross

06/09/2025, 6:41 AM

i do yes

Ross

06/09/2025, 6:42 AM

image.png

Srikanth Chekuri

06/09/2025, 6:42 AM

Click on one metrics and check when was it last received on detail page and see what hosts are sending

Ross

06/09/2025, 6:42 AM

error fetching

Ross

06/09/2025, 6:42 AM

image.png

Srikanth Chekuri

06/09/2025, 6:42 AM

Try again once?

Ross

06/09/2025, 6:43 AM

no change

Ross

06/09/2025, 6:43 AM

click on metric lider slides.. error fetching

Srikanth Chekuri

06/09/2025, 6:43 AM

Can you share the logs of signoz container?

Ross

06/09/2025, 6:43 AM

1 sec

Ross

06/09/2025, 6:44 AM

image.png

Ross

06/09/2025, 6:44 AM

lots of errors

Ross

06/09/2025, 6:44 AM

is getting requests

Copy code

"client.address":"192.168.110.102:44354",

is one of them

Ross

06/09/2025, 6:45 AM

Copy code

or getting metrics summary error","error":"code: 173, message: Couldn't allocate 153 bytes when parsing JSON: while executing 'FUNCTION JSONExtractKeysAndValuesRaw(labels :: 1) -> JSONExtractKeysAndValuesRaw(labels) Array(Tuple(String, String)) : 3'","stacktrace"

Srikanth Chekuri

06/09/2025, 6:45 AM

Ah yes, so host metrics are getting ingested but the other problem is making things not work

Srikanth Chekuri

06/09/2025, 6:46 AM

This is issue from ClickHouse. It doesn't work in certain cases. Let me recollect where I saw this

Ross

06/09/2025, 6:46 AM

Copy code

message: Couldn't allocate 136 bytes when parsing JSON: while executing

Ross

06/09/2025, 6:46 AM

thanks 🙇 i really appreciate your input on this

Ross

06/09/2025, 6:47 AM

mem on the box

Copy code

total        used        free      shared  buff/cache   available
Mem:            30Gi       3.2Gi        19Gi        31Mi       8.7Gi        27Gi
Swap:          8.0Gi          0B       8.0Gi

Ross

06/09/2025, 6:47 AM

QEMU maybe.. but afaik were not using it

Ross

06/09/2025, 6:49 AM

Copy code

apt show qemu-system-x86
Package: qemu-system-x86
Version: 1:8.2.2+ds-0ubuntu1.7
Priority: optional
Section: misc
Source: qemu
Origin: Ubuntu
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Debian QEMU Team <pkg-qemu-devel@lists.alioth.debian.org>
Bugs: <https://bugs.launchpad.net/ubuntu/+filebug>
Installed-Size: 59.9 MB

i lie. its ubuntu of course it is

Ross

06/09/2025, 6:49 AM

https://github.com/ClickHouse/ClickHouse/issues/60661

Ross

06/09/2025, 6:49 AM

sigh.. clickhouse of course.. lol

Ross

06/09/2025, 6:49 AM

zookeeper 👎 clickhouse 👎

Ross

06/09/2025, 6:50 AM

https://github.com/ClickHouse/ClickHouse/issues/60661#issuecomment-1982899875 this looks like an option

Ross

06/09/2025, 6:51 AM

disable simplejson

Srikanth Chekuri

06/09/2025, 6:53 AM

right, disable that option and check again

Ross

06/09/2025, 6:54 AM

image.png

Ross

06/09/2025, 6:54 AM

will have to restart clickhouse

Ross

06/09/2025, 6:56 AM

who doenst love editing java xml.. man ..

Ross

06/09/2025, 6:56 AM

lol

Ross

06/09/2025, 6:56 AM

woot metrics working

Ross

06/09/2025, 6:57 AM

image.png

Srikanth Chekuri

06/09/2025, 6:57 AM

infra hosts list should also work

Ross

06/09/2025, 6:57 AM

woot hsots coming in 🙂

Ross

06/09/2025, 6:57 AM

legendary!

Ross

06/09/2025, 6:57 AM

image.png

Ross

06/09/2025, 6:57 AM

ok so ill try again with podman because docker smeh..

Ross

06/09/2025, 6:57 AM

but this looks like its a clickhouse configuration issue

Ross

06/09/2025, 6:58 AM

might be worth adding to trouble shooting?

Ross

06/09/2025, 6:58 AM

if not metrics show "error fetchig metrics"

Ross

06/09/2025, 6:58 AM

then check logs and look for cloud not allocate bytes error

Ross

06/09/2025, 6:58 AM

and disable simplejson

Srikanth Chekuri

06/09/2025, 6:58 AM

right, it should be added to troubleshooting ok, let us know if you run into any other issue

Ross

06/09/2025, 6:58 AM

thank you so much for your tie Srilkanth! Shukria!

Srikanth Chekuri

06/09/2025, 7:00 AM

@Nagesh Bansal can you get our troubleshooting guide updated with the qemu and json parsing issue?

37 Views

Open in Slack

Previous Next