I m deploying the SigNoz Stack on host network in docker VM SigNoz Community #support

I'm deploying the SigNoz Stack on host network in ...

Anurag Vishwakarma

06/07/2024, 6:27 AM

I'm deploying the SigNoz Stack on host network in docker VM via bash script. I don't know why the otel collector is crashing. I'm using custom nginx config in signoz frontend. Here is the script, env's. Bash Script

Copy code

#!/bin/bash

# Define the Host IP
HOST_IP=10.160.0.41

# Create and run containers
docker run -d --name signoz-clickhouse \
  --hostname clickhouse \
  --network host \
  --restart on-failure \
  -v "$(pwd)/clickhouse-config.xml:/etc/clickhouse-server/config.xml" \
  -v "$(pwd)/clickhouse-users.xml:/etc/clickhouse-server/users.xml" \
  -v "$(pwd)/custom-function.xml:/etc/clickhouse-server/custom-function.xml" \
  -v "$(pwd)/clickhouse-cluster.xml:/etc/clickhouse-server/config.d/cluster.xml" \
  -v "$(pwd)/clickhouse-storage.xml:/etc/clickhouse-server/config.d/storage.xml" \
  -v "$(pwd)/data/clickhouse/:/var/lib/clickhouse/" \
  -v "$(pwd)/user_scripts:/var/lib/clickhouse/user_scripts/" \
  --health-cmd "wget --spider -q 0.0.0.0:8123/ping || exit 1" \
  --health-interval=30s \
  --health-timeout=5s \
  --health-retries=3 \
  clickhouse/clickhouse-server:24.1.2-alpine 

docker run -d --name signoz-alertmanager \
  --network host \
  --restart on-failure \
  -v "$(pwd)/data/alertmanager:/data" \
  --health-cmd "wget --spider -q <http://localhost:9093/api/v1/status> || exit 1" \
  --health-interval=30s \
  --health-timeout=5s \
  --health-retries=3 \
  signoz/alertmanager:0.23.5 --queryService.url=http://$HOST_IP:8085 --storage.path=/data


docker run -d --name signoz-query-service \
  --network host \
  --restart on-failure \
  -v "$(pwd)/prometheus.yml:/root/config/prometheus.yml" \
  -v "$(pwd)/dashboards:/root/config/dashboards" \
  -v "$(pwd)/data/signoz/:/var/lib/signoz/" \
  --env-file signoz-query-service.env \
  --health-cmd "wget --spider -q localhost:8080/api/v1/health || exit 1" \
  --health-interval=30s \
  --health-timeout=5s \
  --health-retries=3 \
  signoz/query-service:0.47.0 -config="/root/config/prometheus.yml"

docker run -d --name signoz-frontend \
  --network host \
  --restart on-failure \
  -v "$(pwd)/nginx.conf:/etc/nginx/conf.d/default.conf" \
  -v "/opt/samespace/samespace-public/samespace.com.crt:/opt/samespace/samespace-public/samespace.com.crt" \
  -v "/opt/samespace/samespace-public/samespace.com.key:/opt/samespace/samespace-public/samespace.com.key" \
  signoz/frontend:0.47.0

docker run -d --name otel-migrator \
  --network host \
  --restart on-failure \
  signoz/signoz-schema-migrator:0.88.26 --dsn="tcp://$HOST_IP:9000"

docker run -d --name signoz-otel-collector \
  --network host \
  --restart on-failure \
  --user root \
  -v "$(pwd)/otel-collector-config.yaml:/etc/otel-collector-config.yaml" \
  -v "$(pwd)/otel-collector-opamp-config.yaml:/etc/manager-config.yaml" \
  -v "/var/lib/docker/containers:/var/lib/docker/containers:ro" \
  -v "/opt/samespace/Cert-mtls/ca.crt:/opt/samespace/Cert-mtls/ca.crt" \
  -v "/opt/samespace/Cert-mtls/gw.key:/opt/samespace/Cert-mtls/gw.key" \
  -v "/opt/samespace/Cert-mtls/mesh.crt:/opt/samespace/Cert-mtls/mesh.crt" \
  --env-file signoz-otel-collector.env \
  --health-cmd "wget --spider -q <http://localhost:13133/health> || exit 1" \
  --health-interval=30s \
  --health-timeout=5s \
  --health-retries=3 \
  signoz/signoz-otel-collector:0.88.26 --config="/etc/otel-collector-config.yaml" --manager-config="/etc/manager-config.yaml" --copy-path="/var/tmp/collector-config.yaml" --feature-gates="-pkg.translator.prometheus.NormalizeName"

OTEL Config:

Copy code

receivers:
  tcplog/docker:
    listen_address: "0.0.0.0:2255"
    operators:
      - type: regex_parser
        regex: '^<([0-9]+)>[0-9]+ (?P<timestamp>[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?) (?P<container_id>\S+) (?P<container_name>\S+) [0-9]+ - -( (?P<body>.*))?'
        timestamp:
          parse_from: attributes.timestamp
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'
      - type: move
        from: attributes["body"]
        to: body
      - type: remove
        field: attributes.timestamp
        # please remove names from below if you want to collect logs from them
      - type: filter
        id: signoz_logs_filter
        expr: 'attributes.container_name matches "^signoz-(logspout|frontend|alertmanager|query-service|otel-collector|clickhouse|zookeeper)"'
  opencensus:
    endpoint: 0.0.0.0:55678
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
        tls:
          cert_file: /opt/samespace/Cert-mtls/mesh.crt
          key_file: /opt/samespace/Cert-mtls/gw.key   
          ca_file: /opt/samespace/Cert-mtls/ca.crt
      http:
        endpoint: 0.0.0.0:4318
        tls:
          cert_file: /opt/samespace/Cert-mtls/mesh.crt
          key_file: /opt/samespace/Cert-mtls/gw.key   
          ca_file: /opt/samespace/Cert-mtls/ca.crt
  otlp/mtls:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
        tls:
          cert_file: /opt/samespace/Cert-mtls/mesh.crt
          key_file: /opt/samespace/Cert-mtls/gw.key   
          ca_file: /opt/samespace/Cert-mtls/ca.crt   
      http:
        endpoint: 0.0.0.0:4318
        tls:
          cert_file: /opt/samespace/Cert-mtls/mesh.crt
          key_file: /opt/samespace/Cert-mtls/gw.key   
          ca_file: /opt/samespace/Cert-mtls/ca.crt
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268
      # thrift_compact:
      #   endpoint: 0.0.0.0:6831
      # thrift_binary:
      #   endpoint: 0.0.0.0:6832
  hostmetrics:
    collection_interval: 30s
    scrapers:
      cpu: {}
      load: {}
      memory: {}
      disk: {}
      filesystem: {}
      network: {}
  prometheus:
    config:
      global:
        scrape_interval: 60s
      scrape_configs:
        # otel-collector internal metrics
        - job_name: otel-collector
          static_configs:
          - targets:
              - 10.160.0.41:8888
            labels:
              job_name: otel-collector


processors:
  batch:
    send_batch_size: 10000
    send_batch_max_size: 11000
    timeout: 10s
  signozspanmetrics/cumulative:
    metrics_exporter: clickhousemetricswrite
    metrics_flush_interval: 60s
    latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s ]
    dimensions_cache_size: 100000
    dimensions:
      - name: service.namespace
        default: default
      - name: deployment.environment
        default: default
      # This is added to ensure the uniqueness of the timeseries
      # Otherwise, identical timeseries produced by multiple replicas of
      # collectors result in incorrect APM metrics
      - name: 'signoz.collector.id'
  # memory_limiter:
  #   # 80% of maximum memory up to 2G
  #   limit_mib: 1500
  #   # 25% of limit up to 2G
  #   spike_limit_mib: 512
  #   check_interval: 5s
  #
  #   # 50% of the maximum memory
  #   limit_percentage: 50
  #   # 20% of max memory usage spike expected
  #   spike_limit_percentage: 20
  # queued_retry:
  #   num_workers: 4
  #   queue_size: 100
  #   retry_on_failure: true
  resourcedetection:
    # Using OTEL_RESOURCE_ATTRIBUTES envvar, env detector adds custom labels.
    detectors: [env, system] # include ec2 for AWS, gcp for GCP and azure for Azure.
    timeout: 2s
  signozspanmetrics/delta:
    metrics_exporter: clickhousemetricswrite
    metrics_flush_interval: 60s
    latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s ]
    dimensions_cache_size: 100000
    aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA
    enable_exp_histogram: true
    dimensions:
      - name: service.namespace
        default: default
      - name: deployment.environment
        default: default
      # This is added to ensure the uniqueness of the timeseries
      # Otherwise, identical timeseries produced by multiple replicas of
      # collectors result in incorrect APM metrics
      - name: signoz.collector.id

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  zpages:
    endpoint: 0.0.0.0:55679
  pprof:
    endpoint: 0.0.0.0:1777

exporters:
  clickhousetraces:
    datasource: <tcp://10.160.0.41:9000/signoz_traces>
    docker_multi_node_cluster: ${DOCKER_MULTI_NODE_CLUSTER}
    low_cardinal_exception_grouping: ${LOW_CARDINAL_EXCEPTION_GROUPING}
  clickhousemetricswrite:
    endpoint: <tcp://10.160.0.41:9000/signoz_metrics>
    resource_to_telemetry_conversion:
      enabled: true
  clickhousemetricswrite/prometheus:
    endpoint: <tcp://10.160.0.41:9000/signoz_metrics>
  clickhouselogsexporter:
    dsn: <tcp://10.160.0.41:9000/signoz_logs>
    docker_multi_node_cluster: ${DOCKER_MULTI_NODE_CLUSTER}
    timeout: 10s
  # logging: {}

service:
  telemetry:
    metrics:
      address: 0.0.0.0:8888
  extensions:
    - health_check
    - zpages
    - pprof
  pipelines:
    traces:
      receivers: [jaeger, otlp]
      processors: [signozspanmetrics/cumulative, signozspanmetrics/delta, batch]
      exporters: [clickhousetraces]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [clickhousemetricswrite]
    metrics/generic:
      receivers: [hostmetrics]
      processors: [resourcedetection, batch]
      exporters: [clickhousemetricswrite]
    metrics/prometheus:
      receivers: [prometheus]
      processors: [batch]
      exporters: [clickhousemetricswrite/prometheus]
    logs:
      receivers: [otlp, tcplog/docker]
      processors: [batch]
      exporters: [clickhouselogsexporter]

ENV

Copy code

OTEL_RESOURCE_ATTRIBUTES=host.name=signoz-host,os.type=linux
DOCKER_MULTI_NODE_CLUSTER=false
LOW_CARDINAL_EXCEPTION_GROUPING=false

ClickHouseUrl=<tcp://10.160.0.41:9000>
ALERTMANAGER_API_PREFIX=<http://10.160.0.41:9093/api/>
SIGNOZ_LOCAL_DB_PATH=/var/lib/signoz/signoz.db
DASHBOARDS_PATH=/root/config/dashboards
STORAGE=clickhouse
GODEBUG=netdns=go
TELEMETRY_ENABLED=true
DEPLOYMENT_TYPE=docker-standalone-amd

server_endpoint: <ws://10.160.0.41:4320/v1/opamp>

The error i'm getting is : OTEL Logs

Copy code

{
  "level": "error",
  "timestamp": "2024-06-07T06:22:58.034Z",
  "caller": "opamp/server_client.go:216",
  "msg": "failed to apply config",
  "component": "opamp-server-client",
  "error": "failed to reload config: /var/tmp/collector-config.yaml: collector failed to restart: failed to build pipelines: failed to create \"clickhouselogsexporter\" exporter for data type \"logs\": cannot configure clickhouse logs exporter: code: 81, message: Database signoz_logs does not exist",
  "stacktrace": "<http://github.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).onRemoteConfigHandler|github.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).onRemoteConfigHandler>\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/server_client.go:216\ngithub.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).onMessageFuncHandler\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/server_client.go:199\ngithub.com/open-telemetry/opamp-go/client/types.CallbacksStruct.OnMessage\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/types/callbacks.go:162\ngithub.com/open-telemetry/opamp-go/client/internal.(*receivedProcessor).ProcessReceivedMessage\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/internal/receivedprocessor.go:131\ngithub.com/open-telemetry/opamp-go/client/internal.(*wsReceiver).ReceiverLoop\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/internal/wsreceiver.go:57\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runOneCycle\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:243\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runUntilStopped\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:265\ngithub.com/open-telemetry/opamp-go/client/internal.(*ClientCommon).StartConnectAndRun.func1\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/internal/clientcommon.go:197"
}

Help please! @nitya-signoz

nitya-signoz

06/07/2024, 6:28 AM

please run the migrator https://github.com/SigNoz/signoz/blob/a17928df8852d5eda1e778c30492c1e3eb733e99/deploy/docker/clickhouse-setup/docker-compose.yaml#L217 again and share the logs of it

Anurag Vishwakarma

06/07/2024, 6:31 AM

otel-migrator logs

Untitled

Anurag Vishwakarma

06/07/2024, 6:37 AM

If i change this config

server_endpoint: <ws://10.160.0.41:4320/v1/opamp>

server_endpoint: <ws://query-service:4320/v1/opamp>

then the collector starts and i get logs like this

Copy code

rror":"dial tcp: lookup query-service on 127.0.0.53:53: server misbehaving","stacktrace":"<http://github.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).Start.func2|github.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).Start.func2>\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/server_client.go:130\ngithub.com/open-telemetry/opamp-go/client/types.CallbacksStruct.OnConnectFailed\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/types/callbacks.go:150\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).tryConnectOnce\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:127\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).ensureConnected\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:165\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runOneCycle\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:202\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runUntilStopped\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:265\ngithub.com/open-telemetry/opamp-go/client/internal.(*ClientCommon).StartConnectAndRun.func1\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/internal/clientcommon.go:197"}
{"level":"error","timestamp":"2024-06-07T06:05:13.985Z","caller":"client/wsclient.go:170","msg":"Connection failed (dial tcp: lookup query-service on 127.0.0.53:53: server misbehaving), will retry.","component":"opamp-server-client","stacktrace":"<http://github.com/open-telemetry/opamp-go/client.(*wsClient).ensureConnected|github.com/open-telemetry/opamp-go/client.(*wsClient).ensureConnected>\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:170\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runOneCycle\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:202\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runUntilStopped\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:265\ngithub.com/open-telemetry/opamp-go/client/internal.(*ClientCommon).StartConnectAndRun.func1\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/internal/clientcommon.go:197"}
{"level":"info","timestamp":"2024-06-07T06:05:14.856Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:15.856Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:16.857Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:17.858Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:18.858Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:19.858Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:20.858Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:21.859Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:22.860Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:23.860Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:24.860Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:25.860Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:26.861Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:27.862Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:28.862Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:29.862Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:30.863Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:31.863Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:32.864Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:33.864Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:34.864Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:35.864Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:36.866Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:37.866Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:38.866Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:39.867Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:40.867Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:41.868Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:42.868Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:43.868Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:44.869Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:45.869Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:46.870Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:47.870Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:48.870Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:49.871Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:50.871Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:51.872Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:52.872Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:53.873Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:54.873Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:55.873Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:56.874Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:57.875Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}
{"level":"info","timestamp":"2024-06-07T06:05:58.875Z","caller":"opamp/server_client.go:171","msg":"Waiting for initial remote config","component":"opamp-server-client"}

Anurag Vishwakarma

06/07/2024, 8:20 AM

@nitya-signoz help!

nitya-signoz

06/07/2024, 9:12 AM

it’s not able to connect to query-service in the above logs and it’s retrying continiously.

nitya-signoz

06/07/2024, 9:12 AM

you can see that the migration failed from the migrator logs as it not able to connect to clickhouse.

Anurag Vishwakarma

06/07/2024, 11:13 AM

Can you please help me to understand this error in log ?

Copy code

{"level":"error","timestamp":"2024-06-07T11:03:26.224Z","caller":"migrationmanager/manager.go:81","msg":"Failed to run migrations for migrator","component":"migrationmanager","migrator":"logs","error":"failed to create database, err: code: 999, message: Cannot resolve any of provided ZooKeeper hosts due to DNS error","stacktrace":"<http://github.com/SigNoz/signoz-otel-collector/migrationmanager.(*MigrationManager).Migrate|github.com/SigNoz/signoz-otel-collector/migrationmanager.(*MigrationManager).Migrate>\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/migrationmanager/manager.go:81\nmain.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozschemamigrator/migrate.go:126\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.10/x64/src/runtime/proc.go:267"}
{"level":"fatal","timestamp":"2024-06-07T11:03:26.224Z","caller":"signozschemamigrator/migrate.go:128","msg":"Failed to run migrations","component":"migrate cli","error":"failed to create database, err: code: 999, message: Cannot resolve any of provided ZooKeeper hosts due to DNS error","stacktrace":"main.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozschemamigrator/migrate.go:128\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.10/x64/src/runtime/proc.go:267"}

nitya-signoz

06/07/2024, 11:14 AM

it says

Cannot resolve any of provided ZooKeeper hosts due to DNS error

, it’s not able to find zookeeper, check if zookeeper is running

Anurag Vishwakarma

06/07/2024, 11:21 AM

Can you please explain why i need zookeeper ? Currently I'm using these containers signoz/frontend signoz/query-service signoz/signoz-otel-collector signoz/signoz-schema-migrator signoz/alertmanager clickhouse/clickhouse-server

Anurag Vishwakarma

06/07/2024, 11:33 AM

these 3 container are crashing when i ran zookeeper

Copy code

a02d6ac53767   signoz/alertmanager:0.23.5                   "/bin/alertmanager -…"   6 minutes ago   Restarting (1) 11 seconds ago             signoz-alertmanager
b7ad94c535cf   signoz/query-service:0.47.0                  "./query-service -co…"   6 minutes ago   Restarting (1) 4 seconds ago              signoz-query-service
0ccd916bb854   signoz/signoz-schema-migrator:0.88.26        "/signoz-schema-migr…"   6 minutes ago   Exited (1) 6 minutes ago                  otel-migrator

nitya-signoz

06/07/2024, 11:35 AM

it’s needed for clickhouse to run as we use distributed schema

Anurag Vishwakarma

06/07/2024, 11:36 AM

Now the other container are keep restarting

nitya-signoz

06/07/2024, 11:37 AM

That is not helping, can you share the logs. First you should have zookeeper and clickhouse up and running, then schema migrator and then the other components

Anurag Vishwakarma

06/07/2024, 11:41 AM

otel migrator logs

Copy code

{"level":"info","timestamp":"2024-06-07T11:40:20.423Z","caller":"signozschemamigrator/migrate.go:106","msg":"Successfully set env var SIGNOZ_CLUSTER ","component":"migrate cli","cluster-name":"cluster"}
{"level":"info","timestamp":"2024-06-07T11:40:20.423Z","caller":"signozschemamigrator/migrate.go:111","msg":"Setting env var SIGNOZ_REPLICATED","component":"migrate cli","replication":false}
{"level":"info","timestamp":"2024-06-07T11:40:20.426Z","caller":"migrationmanager/manager.go:76","msg":"Running migrations for all migrators","component":"migrationmanager"}
{"level":"info","timestamp":"2024-06-07T11:40:20.427Z","caller":"migrationmanager/manager.go:78","msg":"Running migrations for logs","component":"migrationmanager","migrator":"logs"}
{"level":"error","timestamp":"2024-06-07T11:40:20.440Z","caller":"migrationmanager/manager.go:81","msg":"Failed to run migrations for migrator","component":"migrationmanager","migrator":"logs","error":"failed to create database, err: code: 999, message: Cannot resolve any of provided ZooKeeper hosts due to DNS error","stacktrace":"<http://github.com/SigNoz/signoz-otel-collector/migrationmanager.(*MigrationManager).Migrate|github.com/SigNoz/signoz-otel-collector/migrationmanager.(*MigrationManager).Migrate>\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/migrationmanager/manager.go:81\nmain.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozschemamigrator/migrate.go:126\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.10/x64/src/runtime/proc.go:267"}
{"level":"fatal","timestamp":"2024-06-07T11:40:20.440Z","caller":"signozschemamigrator/migrate.go:128","msg":"Failed to run migrations","component":"migrate cli","error":"failed to create database, err: code: 999, message: Cannot resolve any of provided ZooKeeper hosts due to DNS error","stacktrace":"main.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozschemamigrator/migrate.go:128\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.10/x64/src/runtime/proc.go:267"}

nitya-signoz

06/07/2024, 11:42 AM

your clickhouse is still not up and running with zookeeper.

Anurag Vishwakarma

06/07/2024, 11:43 AM

both are running Clickhouse Logs

Copy code

ClickHouse Database directory appears to contain a database; Skipping initialization
Processing configuration file '/etc/clickhouse-server/config.xml'.
Merging configuration file '/etc/clickhouse-server/config.d/cluster.xml'.
Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
Merging configuration file '/etc/clickhouse-server/config.d/storage.xml'.
Logging information to /var/log/clickhouse-server/clickhouse-server.log
Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log

Anurag Vishwakarma

06/07/2024, 11:44 AM

Zookeeper Logs

Copy code

*.jar:/opt/bitnami/zookeeper/bin/../zookeeper-server/src/main/resources/lib/*.jar:/opt/bitnami/zookeeper/bin/../conf:
2024-06-07 11:39:07,977 [myid:1] - INFO  [main:Environment@98] - Server environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
2024-06-07 11:39:07,977 [myid:1] - INFO  [main:Environment@98] - Server environment:java.io.tmpdir=/tmp
2024-06-07 11:39:07,977 [myid:1] - INFO  [main:Environment@98] - Server environment:java.compiler=<NA>
2024-06-07 11:39:07,977 [myid:1] - INFO  [main:Environment@98] - Server environment:os.name=Linux
2024-06-07 11:39:07,977 [myid:1] - INFO  [main:Environment@98] - Server environment:os.arch=amd64
2024-06-07 11:39:07,977 [myid:1] - INFO  [main:Environment@98] - Server environment:os.version=6.5.0-1020-gcp
2024-06-07 11:39:07,978 [myid:1] - INFO  [main:Environment@98] - Server environment:user.name=zookeeper
2024-06-07 11:39:07,978 [myid:1] - INFO  [main:Environment@98] - Server environment:user.home=/home/zookeeper
2024-06-07 11:39:07,978 [myid:1] - INFO  [main:Environment@98] - Server environment:user.dir=/
2024-06-07 11:39:07,978 [myid:1] - INFO  [main:Environment@98] - Server environment:os.memory.free=1009MB
2024-06-07 11:39:07,978 [myid:1] - INFO  [main:Environment@98] - Server environment:os.memory.max=1024MB
2024-06-07 11:39:07,978 [myid:1] - INFO  [main:Environment@98] - Server environment:os.memory.total=1024MB
2024-06-07 11:39:07,978 [myid:1] - INFO  [main:ZooKeeperServer@138] - zookeeper.enableEagerACLCheck = false
2024-06-07 11:39:07,979 [myid:1] - INFO  [main:ZooKeeperServer@151] - zookeeper.digest.enabled = true
2024-06-07 11:39:07,979 [myid:1] - INFO  [main:ZooKeeperServer@155] - zookeeper.closeSessionTxn.enabled = true
2024-06-07 11:39:07,979 [myid:1] - INFO  [main:ZooKeeperServer@1505] - zookeeper.flushDelay=0
2024-06-07 11:39:07,979 [myid:1] - INFO  [main:ZooKeeperServer@1514] - zookeeper.maxWriteQueuePollTime=0
2024-06-07 11:39:07,979 [myid:1] - INFO  [main:ZooKeeperServer@1523] - zookeeper.maxBatchSize=1000
2024-06-07 11:39:07,980 [myid:1] - INFO  [main:ZooKeeperServer@260] - zookeeper.intBufferStartingSizeBytes = 1024
2024-06-07 11:39:07,981 [myid:1] - INFO  [main:BlueThrottle@141] - Weighed connection throttling is disabled
2024-06-07 11:39:07,983 [myid:1] - INFO  [main:ZooKeeperServer@1306] - minSessionTimeout set to 4000
2024-06-07 11:39:07,984 [myid:1] - INFO  [main:ZooKeeperServer@1315] - maxSessionTimeout set to 40000
2024-06-07 11:39:07,985 [myid:1] - INFO  [main:ResponseCache@45] - getData response cache size is initialized with value 400.
2024-06-07 11:39:07,986 [myid:1] - INFO  [main:ResponseCache@45] - getChildren response cache size is initialized with value 400.
2024-06-07 11:39:07,987 [myid:1] - INFO  [main:RequestPathMetricsCollector@109] - zookeeper.pathStats.slotCapacity = 60
2024-06-07 11:39:07,987 [myid:1] - INFO  [main:RequestPathMetricsCollector@110] - zookeeper.pathStats.slotDuration = 15
2024-06-07 11:39:07,988 [myid:1] - INFO  [main:RequestPathMetricsCollector@111] - zookeeper.pathStats.maxDepth = 6
2024-06-07 11:39:07,988 [myid:1] - INFO  [main:RequestPathMetricsCollector@112] - zookeeper.pathStats.initialDelay = 5
2024-06-07 11:39:07,988 [myid:1] - INFO  [main:RequestPathMetricsCollector@113] - zookeeper.pathStats.delay = 5
2024-06-07 11:39:07,988 [myid:1] - INFO  [main:RequestPathMetricsCollector@114] - zookeeper.pathStats.enabled = false
2024-06-07 11:39:07,991 [myid:1] - INFO  [main:ZooKeeperServer@1542] - The max bytes for all large requests are set to 104857600
2024-06-07 11:39:07,991 [myid:1] - INFO  [main:ZooKeeperServer@1556] - The large request threshold is set to -1
2024-06-07 11:39:07,992 [myid:1] - INFO  [main:AuthenticationHelper@66] - zookeeper.enforce.auth.enabled = false
2024-06-07 11:39:07,992 [myid:1] - INFO  [main:AuthenticationHelper@67] - zookeeper.enforce.auth.schemes = []
2024-06-07 11:39:07,992 [myid:1] - INFO  [main:ZooKeeperServer@361] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 clientPortListenBacklog -1 datadir /bitnami/zookeeper/data/version-2 snapdir /bitnami/zookeeper/data/version-2
2024-06-07 11:39:08,023 [myid:1] - INFO  [main:Log@170] - Logging initialized @819ms to org.eclipse.jetty.util.log.Slf4jLog
2024-06-07 11:39:08,144 [myid:1] - WARN  [main:ContextHandler@1656] - o.e.j.s.ServletContextHandler@7a5ceedd{/,null,STOPPED} contextPath ends with /*
2024-06-07 11:39:08,145 [myid:1] - WARN  [main:ContextHandler@1667] - Empty contextPath
2024-06-07 11:39:08,181 [myid:1] - INFO  [main:Server@375] - jetty-9.4.43.v20210629; built: 2021-06-30T11:07:22.254Z; git: 526006ecfa3af7f1a27ef3a288e2bef7ea9dd7e8; jvm 11.0.20+8-LTS
2024-06-07 11:39:08,223 [myid:1] - INFO  [main:DefaultSessionIdManager@334] - DefaultSessionIdManager workerName=node0
2024-06-07 11:39:08,223 [myid:1] - INFO  [main:DefaultSessionIdManager@339] - No SessionScavenger set, using defaults
2024-06-07 11:39:08,225 [myid:1] - INFO  [main:HouseKeeper@132] - node0 Scavenging every 660000ms
2024-06-07 11:39:08,229 [myid:1] - WARN  [main:ConstraintSecurityHandler@759] - ServletContext@o.e.j.s.ServletContextHandler@7a5ceedd{/,null,STARTING} has uncovered http methods for path: /*
2024-06-07 11:39:08,240 [myid:1] - INFO  [main:ContextHandler@915] - Started o.e.j.s.ServletContextHandler@7a5ceedd{/,null,AVAILABLE}
2024-06-07 11:39:08,259 [myid:1] - INFO  [main:AbstractConnector@331] - Started ServerConnector@3e8c3cb{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
2024-06-07 11:39:08,260 [myid:1] - INFO  [main:Server@415] - Started @1056ms
2024-06-07 11:39:08,260 [myid:1] - INFO  [main:JettyAdminServer@190] - Started AdminServer on address 0.0.0.0, port 8080 and command URL /commands
2024-06-07 11:39:08,268 [myid:1] - INFO  [main:ServerCnxnFactory@169] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
2024-06-07 11:39:08,269 [myid:1] - WARN  [main:ServerCnxnFactory@309] - maxCnxns is not configured, using default value 0.
2024-06-07 11:39:08,271 [myid:1] - INFO  [main:NIOServerCnxnFactory@652] - Configuring NIO connection handler with 10s sessionless connection timeout, 1 selector thread(s), 8 worker threads, and 64 kB direct buffers.
2024-06-07 11:39:08,273 [myid:1] - INFO  [main:NIOServerCnxnFactory@660] - binding to port 0.0.0.0/0.0.0.0:2181
2024-06-07 11:39:08,294 [myid:1] - INFO  [main:WatchManagerFactory@42] - Using org.apache.zookeeper.server.watch.WatchManager as watch manager
2024-06-07 11:39:08,295 [myid:1] - INFO  [main:WatchManagerFactory@42] - Using org.apache.zookeeper.server.watch.WatchManager as watch manager
2024-06-07 11:39:08,296 [myid:1] - INFO  [main:ZKDatabase@133] - zookeeper.snapshotSizeFactor = 0.33
2024-06-07 11:39:08,296 [myid:1] - INFO  [main:ZKDatabase@153] - zookeeper.commitLogCount=500
2024-06-07 11:39:08,297 [myid:1] - INFO  [main:FileSnap@85] - Reading snapshot /bitnami/zookeeper/data/version-2/snapshot.0
2024-06-07 11:39:08,300 [myid:1] - INFO  [main:DataTree@1716] - The digest value is empty in snapshot
2024-06-07 11:39:08,303 [myid:1] - INFO  [main:ZKDatabase@290] - Snapshot loaded in 6 ms, highest zxid is 0x0, digest is 1371985504
2024-06-07 11:39:08,304 [myid:1] - INFO  [main:FileTxnSnapLog@479] - Snapshotting: 0x0 to /bitnami/zookeeper/data/version-2/snapshot.0
2024-06-07 11:39:08,306 [myid:1] - INFO  [main:ZooKeeperServer@543] - Snapshot taken in 2 ms
2024-06-07 11:39:08,317 [myid:1] - INFO  [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@137] - PrepRequestProcessor (sid:0) started, reconfigEnabled=false
2024-06-07 11:39:08,317 [myid:1] - INFO  [main:RequestThrottler@75] - zookeeper.request_throttler.shutdownTimeout = 10000
2024-06-07 11:39:08,338 [myid:1] - INFO  [main:ContainerManager@84] - Using checkIntervalMs=60000 maxPerMinute=10000 maxNeverUsedIntervalMs=0
2024-06-07 11:39:08,339 [myid:1] - INFO  [main:ZKAuditProvider@42] - ZooKeeper audit is disabled.

nitya-signoz

06/07/2024, 11:45 AM

try to exec into clickhouse container and run the client

Copy code

cickhouse client

try to create some dummy tables to check if your clickhouse is actually running. I yes try running the schema migrator again

Anurag Vishwakarma

06/07/2024, 11:47 AM

Look it's working

Copy code

clickhouse:/# clickhouse client
ClickHouse client version 24.1.2.5 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 24.1.2.

Warnings:
 * Delay accounting is not enabled, OSIOWaitMicroseconds will not be gathered. Check /proc/sys/kernel/task_delayacct

clickhouse :) SHOW DATABASES;

SHOW DATABASES

Query id: 56f95576-c346-407a-b46a-ebc98d89d125

┌─name───────────────┐
│ INFORMATION_SCHEMA │
│ default            │
│ information_schema │
│ system             │
└────────────────────┘

4 rows in set. Elapsed: 0.002 sec.

clickhouse :) CREATE DATABASE signoz_logs_test;

CREATE DATABASE signoz_logs_test

Query id: 17727f2d-e1fd-4f7a-bf62-adb531632eed

Ok.

0 rows in set. Elapsed: 0.007 sec.

clickhouse :) SHOW DATABASES;

SHOW DATABASES

Query id: 1aab535d-6a88-48cf-b36d-4a16c4a3c1ed

┌─name───────────────┐
│ INFORMATION_SCHEMA │
│ default            │
│ information_schema │
│ signoz_logs_test   │
│ system             │
└────────────────────┘

5 rows in set. Elapsed: 0.002 sec.

clickhouse :)

Anurag Vishwakarma

06/07/2024, 11:50 AM

I restarted the

otel migrator

and it showing the same error

Copy code

{
  "level": "fatal",
  "timestamp": "2024-06-07T11:49:07.232Z",
  "caller": "signozschemamigrator/migrate.go:128",
  "msg": "Failed to run migrations",
  "component": "migrate cli",
  "error": "failed to create database, err: code: 999, message: Cannot resolve any of provided ZooKeeper hosts due to DNS error",
  "stacktrace": "main.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozschemamigrator/migrate.go:128\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.10/x64/src/runtime/proc.go:267"
}

Anurag Vishwakarma

06/07/2024, 11:57 AM

@nitya-signoz waiting for your response!

nitya-signoz

06/07/2024, 11:59 AM

This sill mentions that something is wrong with clickhouse. Try creating a dummy distributed table in clickhouse with cluster name

cluster

. https://clickhouse.com/docs/en/engines/table-engines/special/distributed

Anurag Vishwakarma

06/07/2024, 12:06 PM

This is what I'm getting

Copy code

clickhouse:/# clickhouse client
ClickHouse client version 24.1.2.5 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 24.1.2.

Warnings:
 * Delay accounting is not enabled, OSIOWaitMicroseconds will not be gathered. Check /proc/sys/kernel/task_delayacct

clickhouse :) SHOW DATABASES;

SHOW DATABASES

Query id: 42132b65-a2cb-4307-9de8-877c9a362167

┌─name───────────────┐
│ INFORMATION_SCHEMA │
│ default            │
│ information_schema │
│ signoz_logs_test   │
│ system             │
└────────────────────┘

5 rows in set. Elapsed: 0.002 sec.

clickhouse :) CREATE TABLE IF NOT EXISTS my_distributed_table
ON CLUSTER cluster
(
    id UInt32,
    name String,
    value Float32
) ENGINE = Distributed(cluster, default, my_local_table, id)
SETTINGS
    fsync_after_insert = 0,
    fsync_directories = 0;


CREATE TABLE IF NOT EXISTS my_distributed_table ON CLUSTER cluster
(
    `id` UInt32,
    `name` String,
    `value` Float32
)
ENGINE = Distributed(cluster, default, my_local_table, id)
SETTINGS fsync_after_insert = 0, fsync_directories = 0

Query id: 14ceca8d-9c9d-42ba-b5e9-b7744ead6265


Elapsed: 0.019 sec.

Received exception from server (version 24.1.2):
Code: 999. DB::Exception: Received from localhost:9000. Coordination::Exception. Coordination::Exception: Cannot resolve any of provided ZooKeeper hosts due to DNS error. (KEEPER_EXCEPTION)

clickhouse :)

nitya-signoz

06/07/2024, 12:07 PM

this confirms your clickhouse is still not able to reach zookeeper

Anurag Vishwakarma

06/07/2024, 12:08 PM

Ok but both is on the host network. Here is my bash script

Copy code

docker run -d --name signoz-zookeeper-1 \
  --hostname zookeeper-1 \
  --user root \
  --net=host \
  --volume $(pwd)/data/zookeeper-1:/bitnami/zookeeper \
  --env ZOO_SERVER_ID=1 \
  --env ALLOW_ANONYMOUS_LOGIN=yes \
  --env ZOO_AUTOPURGE_INTERVAL=1 \
  bitnami/zookeeper:3.7.1

docker run -d --name signoz-clickhouse \
  --hostname clickhouse \
  --network host \
  --restart on-failure \
  -v "$(pwd)/clickhouse-config.xml:/etc/clickhouse-server/config.xml" \
  -v "$(pwd)/clickhouse-users.xml:/etc/clickhouse-server/users.xml" \
  -v "$(pwd)/custom-function.xml:/etc/clickhouse-server/custom-function.xml" \
  -v "$(pwd)/clickhouse-cluster.xml:/etc/clickhouse-server/config.d/cluster.xml" \
  -v "$(pwd)/clickhouse-storage.xml:/etc/clickhouse-server/config.d/storage.xml" \
  -v "$(pwd)/data/clickhouse/:/var/lib/clickhouse/" \
  -v "$(pwd)/user_scripts:/var/lib/clickhouse/user_scripts/" \
  --health-cmd "wget --spider -q 0.0.0.0:8123/ping || exit 1" \
  --health-interval=30s \
  --health-timeout=5s \
  --health-retries=3 \
  clickhouse/clickhouse-server:24.1.2-alpine

nitya-signoz

06/07/2024, 12:08 PM

did you restart clickhouse after starting zookeeper ?

Anurag Vishwakarma

06/07/2024, 12:09 PM

nitya-signoz

06/07/2024, 12:09 PM

please do

Anurag Vishwakarma

06/07/2024, 12:10 PM

Same response

Copy code

Received exception from server (version 24.1.2):
Code: 999. DB::Exception: Received from localhost:9000. Coordination::Exception. Coordination::Exception: Cannot resolve any of provided ZooKeeper hosts due to DNS error. (KEEPER_EXCEPTION)

nitya-signoz

06/07/2024, 12:12 PM

Interesting, ideally it should work as host is zookeeper-1 only https://github.com/SigNoz/signoz/blob/cf54b5f9ec5e8a0d16950993487e2eb9375ecc20/deploy/docker/clickhouse-setup/clickhouse-cluster.xml#L10

nitya-signoz

06/07/2024, 12:12 PM

why are you not using docker compose ?

Anurag Vishwakarma

06/07/2024, 12:14 PM

My senior said deploy it on host network mode instead of bridge

nitya-signoz

06/07/2024, 7:22 PM

I think the cultprit are these configs, it might not work when the network is host https://github.com/SigNoz/signoz/blob/cf54b5f9ec5e8a0d16950993487e2eb9375ecc20/deploy/docker/clickhouse-setup/clickhouse-cluster.xml#L10 do confirm once by changing the host and testing in the above xml

Anurag Vishwakarma

06/10/2024, 8:37 AM

Now i want to connect the AWS S3 with Clickhouse getting this

dependency failed to start: container signoz-clickhouse is unhealthy

Copy code

(version 24.1.2.5 (official build))
2024.06.10 08:35:56.866346 [ 685 ] {} <Information> AWSClient: AWSXmlClient: HTTP response code: -1
Resolved remote host IP address: <http://signoz-samespace-com.s3.ap-south-1.amazonaws.com:443|signoz-samespace-com.s3.ap-south-1.amazonaws.com:443>
Request ID:
Exception name:
Error message: Poco::Exception. Code: 1000, e.code() = 0, Timeout: connect timed out: 52.219.160.74:443 (version 24.1.2.5 (official build))
0 response headers:
2024.06.10 08:35:56.866399 [ 685 ] {} <Information> AWSClient: If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2024.06.10 08:35:56.866422 [ 685 ] {} <Information> AWSClient: Request failed, now waiting 800 ms before attempting again.
2024.06.10 08:36:01.882235 [ 86 ] {} <Information> DNSCacheUpdater: IPs of some hosts have been changed. Will reload cluster config.
2024.06.10 08:36:16.894637 [ 85 ] {} <Information> DNSCacheUpdater: IPs of some hosts have been changed. Will reload cluster config.
2024.06.10 08:36:21.720402 [ 685 ] {} <Information> AWSClient: Failed to make request to: <https://signoz-samespace-com.s3.ap-south-1.amazonaws.com/data/clickhouse_remove_objects_capability_9fb511a0-c929-4720-87e3-86d6dff742e5>: Poco::Exception. Code: 1000, e.code() = 0, Timeout: connect timed out: 3.5.210.15:443, Stack trace (when copying this message, always include the lines below):

Anurag Vishwakarma

06/10/2024, 8:45 AM

@nitya-signoz help!!

nitya-signoz

06/10/2024, 9:00 AM

Does your machine has access to this s3 bucket ? also proper creds are present right?

Anurag Vishwakarma

06/10/2024, 9:11 AM

Yes

Anurag Vishwakarma

06/10/2024, 9:18 AM

I can access & upload files via aws s3 cli in the same VM.

Anurag Vishwakarma

06/10/2024, 9:23 AM

telnet is working too

Copy code

telnet 52.219.160.74 443

Trying 52.219.160.74...
Connected to 52.219.160.74.
Escape character is '^]'.

nitya-signoz

06/10/2024, 9:57 AM

@Prashant Shahi any idea on what might be the issue with s3 connection ?

Anurag Vishwakarma

06/10/2024, 10:58 AM

Waiting for your response!

Anurag Vishwakarma

06/11/2024, 7:52 AM

Ok the problem is solved. @nitya-signoz can you tell me what is this issue i'm trying to setup email and i configured SMTP in alertmanager & query service. I'm trying to setup invite & alert emails.

Copy code

level=error ts=2024-06-11T07:49:03.419Z caller=api.go:808 component=api version=v1 msg="API error" err="server_error: 'require_tls' is true (default) but \"<http://smtp.sendgrid.net:465\|smtp.sendgrid.net:465\>" does not advertise the STARTTLS extension

Anurag Vishwakarma

06/11/2024, 8:38 AM

Fixed!

👍 1

aryasena

08/29/2024, 7:05 AM

@Anurag Vishwakarma hi there, im having trouble while trying to setup email and SNMP and have the exact same error log. can you tell me how did you fix this issue here are the link for my discussion and error detail https://signoz-community.slack.com/archives/C01HWUTP4HH/p1724814572169589

235 Views

Open in Slack

Previous Next