(*Signoz Docker Standalone deployment*) Hi All The...
# support
m
(Signoz Docker Standalone deployment) Hi All The
signoz-otel-collector
keeps restarting. Does anyone know what "*Error creating ClickHouse client: database should be set in ClickHouse DSN*" means? The ClickHouse database is up and running, and I can connect and run queries from the command line. Does anyone have insights into what might be causing this error or how to resolve it? Any help would be greatly appreciated.
Copy code
2024-06-07T18:14:04.037Z        info    service@v0.88.0/telemetry.go:84 Setting up own telemetry...
2024-06-07T18:14:04.037Z        info    service@v0.88.0/telemetry.go:201        Serving Prometheus metrics      {"address": "0.0.0.0:8888", "level": "Basic"}
2024-06-07T18:14:04.038Z        info    exporter@v0.88.0/exporter.go:275        Stability level of component is undefined       {"kind": "exporter", "data_type": "metrics", "name": "clickhousemetricswrite"}
2024/06/07 18:14:04 Error creating clickhouse client: database should be set in ClickHouse DSN
{"level":"info","timestamp":"2024-06-07T18:15:05.380Z","logger":"dynamic-config","caller":"opamp/config_manager.go:89","msg":"Added instance id to config file","component":"opamp-server-client","instance_id":"a2430171-01f1-4b94-b4b8-9ac406e7a368"}
{"level":"info","timestamp":"2024-06-07T18:15:05.380Z","caller":"service/service.go:69","msg":"Starting service"}
{"level":"info","timestamp":"2024-06-07T18:15:05.380Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T18:15:05.382Z","caller":"opamp/server_client.go:127","msg":"Connected to the server.","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T18:15:05.391Z","logger":"agent-config-manager","caller":"opamp/config_manager.go:172","msg":"Config has changed, reloading","path":"/var/tmp/collector-config.yaml"}
{"level":"info","timestamp":"2024-06-07T18:15:05.391Z","caller":"signozcol/collector.go:168","msg":"Restarting collector service"}
{"level":"info","timestamp":"2024-06-07T18:15:05.391Z","caller":"signozcol/collector.go:144","msg":"Shutting down collector service"}
{"level":"info","timestamp":"2024-06-07T18:15:05.391Z","caller":"signozcol/collector.go:154","msg":"Collector service is not running"}
{"level":"info","timestamp":"2024-06-07T18:15:05.392Z","caller":"signozcol/collector.go:103","msg":"Starting collector service"}
2024-06-07T18:15:05.403Z        info    service@v0.88.0/telemetry.go:84 Setting up own telemetry...
2024-06-07T18:15:05.403Z        info    service@v0.88.0/telemetry.go:201        Serving Prometheus metrics      {"address": "0.0.0.0:8888", "level": "Basic"}
2024-06-07T18:15:05.405Z        info    exporter@v0.88.0/exporter.go:275        Stability level of component is undefined       {"kind": "exporter", "data_type": "metrics", "name": "clickhousemetricswrite"}
2024/06/07 18:15:05 Error creating clickhouse client: database should be set in ClickHouse DSN
s
Can you share the config you used for exporters?
m
Hi @Srikanth Chekuri Is that what you were asking to see?
Copy code
receivers:
  tcplog/docker:
    listen_address: "0.0.0.0:2255"
    operators:
      - type: regex_parser
        regex: '^<([0-9]+)>[0-9]+ (?P<timestamp>[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?) (?P<container_id>\S+) (?P<container_name>\S+) [0-9]+ - -( (?P<body>.*))?'
        timestamp:
          parse_from: attributes.timestamp
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'
      - type: move
        from: attributes["body"]
        to: body
      - type: remove
        field: attributes.timestamp
        # please remove names from below if you want to collect logs from them
      - type: filter
        id: signoz_logs_filter
        #expr: 'attributes.container_name matches "^signoz-(logspout|frontend|alertmanager|query-service|otel-collector|clickhouse|zookeeper)"'
        expr: 'attributes.container_name matches "^signoz-(frontend|alertmanager|query-service|zookeeper)"'
  opencensus:
    endpoint: 0.0.0.0:55678
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268
      # thrift_compact:
      #   endpoint: 0.0.0.0:6831
      # thrift_binary:
      #   endpoint: 0.0.0.0:6832
  hostmetrics:
    collection_interval: 30s
    scrapers:
      cpu: {}
      load: {}
      memory: {}
      disk: {}
      filesystem: {}
      network: {}
  prometheus:
    config:
      global:
        scrape_interval: 60s
      scrape_configs:
        # otel-collector internal metrics
        - job_name: otel-collector
          static_configs:
          - targets:
              - localhost:8888
            labels:
              job_name: otel-collector


processors:
  batch:
    send_batch_size: 10000
    send_batch_max_size: 11000
    timeout: 10s
  signozspanmetrics/cumulative:
    metrics_exporter: clickhousemetricswrite
    metrics_flush_interval: 60s
    latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s ]
    dimensions_cache_size: 100000
    dimensions:
      - name: service.namespace
        default: default
      - name: deployment.environment
        default: default
      # This is added to ensure the uniqueness of the timeseries
      # Otherwise, identical timeseries produced by multiple replicas of
      # collectors result in incorrect APM metrics
      - name: 'signoz.collector.id'
  # memory_limiter:
  #   # 80% of maximum memory up to 2G
  #   limit_mib: 1500
  #   # 25% of limit up to 2G
  #   spike_limit_mib: 512
  #   check_interval: 5s
  #
  #   # 50% of the maximum memory
  #   limit_percentage: 50
  #   # 20% of max memory usage spike expected
  #   spike_limit_percentage: 20
  # queued_retry:
  #   num_workers: 4
  #   queue_size: 100
  #   retry_on_failure: true
  resourcedetection:
    # Using OTEL_RESOURCE_ATTRIBUTES envvar, env detector adds custom labels.
    detectors: [env, system] # include ec2 for AWS, gcp for GCP and azure for Azure.
    timeout: 2s
  signozspanmetrics/delta:
    metrics_exporter: clickhousemetricswrite
    metrics_flush_interval: 60s
    latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s ]
    dimensions_cache_size: 100000
    aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA
    enable_exp_histogram: true
    dimensions:
      - name: service.namespace
        default: default
      - name: deployment.environment
        default: default
      # This is added to ensure the uniqueness of the timeseries
      # Otherwise, identical timeseries produced by multiple replicas of
      # collectors result in incorrect APM metrics
      - name: signoz.collector.id

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  zpages:
    endpoint: 0.0.0.0:55679
  pprof:
    endpoint: 0.0.0.0:1777

exporters:
  clickhousetraces:
    datasource: <tcp://clickhouse:9000/signoz_traces>
    docker_multi_node_cluster: ${DOCKER_MULTI_NODE_CLUSTER}
    low_cardinal_exception_grouping: ${LOW_CARDINAL_EXCEPTION_GROUPING}
  clickhousemetricswrite:
    endpoint: <tcp://clickhouse:9000/signoz_metrics>
    resource_to_telemetry_conversion:
      enabled: true
  clickhousemetricswrite/prometheus:
    endpoint: <tcp://clickhouse:9000/signoz_metrics>
  clickhouselogsexporter:
    dsn: <tcp://clickhouse:9000/signoz_logs>
    docker_multi_node_cluster: ${DOCKER_MULTI_NODE_CLUSTER}
    timeout: 10s
  # logging: {}

service:
  telemetry:
    metrics:
      address: 0.0.0.0:8888
  extensions:
    - health_check
    - zpages
    - pprof
  pipelines:
    traces:
      receivers: [jaeger, otlp]
      processors: [signozspanmetrics/cumulative, signozspanmetrics/delta, batch]
      exporters: [clickhousetraces]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [clickhousemetricswrite]
    metrics/generic:
      receivers: [hostmetrics]
      processors: [resourcedetection, batch]
      exporters: [clickhousemetricswrite]
    metrics/prometheus:
      receivers: [prometheus]
      processors: [batch]
      exporters: [clickhousemetricswrite/prometheus]
    logs:
      receivers: [otlp, tcplog/docker]
      processors: [batch]
      exporters: [clickhouselogsexporter]
s
What is the signoz-otel-collector version you are using?
m
Hi @Srikanth Chekuri Here we go.
Copy code
gliderlabs/logspout:v3.2.14                  "/bin/logspout syslo…"   42 hours ago   Restarting (1) 25 seconds ago                                                                                      signoz-logspout
signoz/frontend:0.42.0                       "nginx -g 'daemon of…"   42 hours ago   Up 42 hours                     80/tcp, 0.0.0.0:3301->3301/tcp                                                     signoz-frontend
signoz/alertmanager:0.23.5                   "/bin/alertmanager -…"   42 hours ago   Up 42 hours                     9093/tcp                                                                           signoz-alertmanager
signoz/signoz-otel-collector:0.88.17         "/signoz-collector -…"   42 hours ago   Exited (0) 42 hours ago                                                                                            signoz-otel-collector
signoz/query-service:0.42.0                  "./query-service -co…"   42 hours ago   Up 42 hours (healthy)           8080/tcp                                                                           signoz-query-service
signoz/signoz-schema-migrator:0.88.12        "/signoz-schema-migr…"   42 hours ago   Exited (0) 42 hours ago                                                                                            otel-migrator
clickhouse/clickhouse-server:24.1.2-alpine   "/entrypoint.sh"         42 hours ago   Up 42 hours (healthy)           0.0.0.0:8123->8123/tcp, 0.0.0.0:9000->9000/tcp, 0.0.0.0:9181->9181/tcp, 9009/tcp   signoz-clickhouse
bitnami/zookeeper:latest                     "/opt/bitnami/script…"   42 hours ago   Up 42 hours                     0.0.0.0:2181->2181/tcp, 0.0.0.0:2888->2888/tcp, 0.0.0.0:3888->3888/tcp, 8080/tcp   signoz-zookeeper-1
signoz/locust:1.2.3                          "/docker-entrypoint.…"   42 hours ago   Up 42 hours                     5557-5558/tcp, 8089/tcp                                                            load-hotrod
jaegertracing/example-hotrod:1.30            "/go/bin/hotrod-linu…"   42 hours ago   Up 42 hours                     8080-8083/tcp                                                                      hotrod
@Srikanth Chekuri Interesting it's seams that the signoz-otel-collector has Exited (0) after many reboots. Below the logs of the signoz-otel-collector. But the signoz-logspout keeps rebooting. signoz-logspout logs:
Copy code
2024/06/09 12:39:29 # logspout v3.2.14 by gliderlabs
2024/06/09 12:39:29 # adapters: udp multiline raw syslog tcp tls
2024/06/09 12:39:29 # options : 
2024/06/09 12:39:29 persist:/mnt/routes
2024/06/09 12:39:29 !! lookup otel-collector on 127.0.0.11:53: server misbehaving
signoz-otel-collector logs:
Copy code
{"level":"info","timestamp":"2024-06-07T18:47:26.922Z","caller":"service/service.go:79","msg":"Shutting down service"}
{"level":"info","timestamp":"2024-06-07T18:47:26.922Z","caller":"opamp/server_client.go:185","msg":"Stopping OpAMP server client","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T18:47:26.922Z","caller":"signozcol/collector.go:144","msg":"Shutting down collector service"}
{"level":"info","timestamp":"2024-06-07T18:47:26.922Z","caller":"signozcol/collector.go:152","msg":"Collector service is shut down"}
{"level":"info","timestamp":"2024-06-07T18:47:26.922Z","caller":"opamp/client.go:53","msg":"Collector is stopped","component":"opamp-server-client"}
{"level":"error","timestamp":"2024-06-07T18:47:26.922Z","caller":"internal/wsreceiver.go:53","msg":"Unexpected error while receiving: read tcp 172.29.0.10:55292->172.29.0.6:4320: use of closed network connection","component":"opamp-server-client","stacktrace":"github.com/open-telemetry/opamp-go/client/internal.(*wsReceiver).ReceiverLoop\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/internal/wsreceiver.go:53\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runOneCycle\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:243\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runUntilStopped\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:265\ngithub.com/open-telemetry/opamp-go/client/internal.(*ClientCommon).StartConnectAndRun.func1\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/internal/clientcommon.go:197"}
{"level":"error","timestamp":"2024-06-07T18:47:26.922Z","caller":"client/wsclient.go:170","msg":"Connection failed (dial tcp: lookup query-service: operation was canceled), will retry.","component":"opamp-server-client","stacktrace":"github.com/open-telemetry/opamp-go/client.(*wsClient).ensureConnected\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:170\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runOneCycle\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:202\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runUntilStopped\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:265\ngithub.com/open-telemetry/opamp-go/client/internal.(*ClientCommon).StartConnectAndRun.func1\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/internal/clientcommon.go:197"}
{"level":"info","timestamp":"2024-06-07T18:47:26.922Z","caller":"service/service.go:83","msg":"Client stopped successfully"}
@Srikanth Chekuri signoz-otel-collector:0.88.17
s
Please try
0.88.21
m
Hi @Srikanth Chekuri That worked out for me. It's not restarting anymore. Thank you for your help. Was that a known issue?
s
It was not an issue. The config you used and the version of the collector were not compatible.
m
@Srikanth Chekuri I see. My bad. Thank you for your help again!