I'm deploying the SigNoz Stack on host network in docker VM via bash script. I don't know why the otel collector is crashing. I'm using custom nginx config in signoz frontend. Here is the script, env's.
Bash Script
#!/bin/bash
# Define the Host IP
HOST_IP=10.160.0.41
# Create and run containers
docker run -d --name signoz-clickhouse \
--hostname clickhouse \
--network host \
--restart on-failure \
-v "$(pwd)/clickhouse-config.xml:/etc/clickhouse-server/config.xml" \
-v "$(pwd)/clickhouse-users.xml:/etc/clickhouse-server/users.xml" \
-v "$(pwd)/custom-function.xml:/etc/clickhouse-server/custom-function.xml" \
-v "$(pwd)/clickhouse-cluster.xml:/etc/clickhouse-server/config.d/cluster.xml" \
-v "$(pwd)/clickhouse-storage.xml:/etc/clickhouse-server/config.d/storage.xml" \
-v "$(pwd)/data/clickhouse/:/var/lib/clickhouse/" \
-v "$(pwd)/user_scripts:/var/lib/clickhouse/user_scripts/" \
--health-cmd "wget --spider -q 0.0.0.0:8123/ping || exit 1" \
--health-interval=30s \
--health-timeout=5s \
--health-retries=3 \
clickhouse/clickhouse-server:24.1.2-alpine
docker run -d --name signoz-alertmanager \
--network host \
--restart on-failure \
-v "$(pwd)/data/alertmanager:/data" \
--health-cmd "wget --spider -q <http://localhost:9093/api/v1/status> || exit 1" \
--health-interval=30s \
--health-timeout=5s \
--health-retries=3 \
signoz/alertmanager:0.23.5 --queryService.url=http://$HOST_IP:8085 --storage.path=/data
docker run -d --name signoz-query-service \
--network host \
--restart on-failure \
-v "$(pwd)/prometheus.yml:/root/config/prometheus.yml" \
-v "$(pwd)/dashboards:/root/config/dashboards" \
-v "$(pwd)/data/signoz/:/var/lib/signoz/" \
--env-file signoz-query-service.env \
--health-cmd "wget --spider -q localhost:8080/api/v1/health || exit 1" \
--health-interval=30s \
--health-timeout=5s \
--health-retries=3 \
signoz/query-service:0.47.0 -config="/root/config/prometheus.yml"
docker run -d --name signoz-frontend \
--network host \
--restart on-failure \
-v "$(pwd)/nginx.conf:/etc/nginx/conf.d/default.conf" \
-v "/opt/samespace/samespace-public/samespace.com.crt:/opt/samespace/samespace-public/samespace.com.crt" \
-v "/opt/samespace/samespace-public/samespace.com.key:/opt/samespace/samespace-public/samespace.com.key" \
signoz/frontend:0.47.0
docker run -d --name otel-migrator \
--network host \
--restart on-failure \
signoz/signoz-schema-migrator:0.88.26 --dsn="tcp://$HOST_IP:9000"
docker run -d --name signoz-otel-collector \
--network host \
--restart on-failure \
--user root \
-v "$(pwd)/otel-collector-config.yaml:/etc/otel-collector-config.yaml" \
-v "$(pwd)/otel-collector-opamp-config.yaml:/etc/manager-config.yaml" \
-v "/var/lib/docker/containers:/var/lib/docker/containers:ro" \
-v "/opt/samespace/Cert-mtls/ca.crt:/opt/samespace/Cert-mtls/ca.crt" \
-v "/opt/samespace/Cert-mtls/gw.key:/opt/samespace/Cert-mtls/gw.key" \
-v "/opt/samespace/Cert-mtls/mesh.crt:/opt/samespace/Cert-mtls/mesh.crt" \
--env-file signoz-otel-collector.env \
--health-cmd "wget --spider -q <http://localhost:13133/health> || exit 1" \
--health-interval=30s \
--health-timeout=5s \
--health-retries=3 \
signoz/signoz-otel-collector:0.88.26 --config="/etc/otel-collector-config.yaml" --manager-config="/etc/manager-config.yaml" --copy-path="/var/tmp/collector-config.yaml" --feature-gates="-pkg.translator.prometheus.NormalizeName"
OTEL Config:
receivers:
tcplog/docker:
listen_address: "0.0.0.0:2255"
operators:
- type: regex_parser
regex: '^<([0-9]+)>[0-9]+ (?P<timestamp>[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?) (?P<container_id>\S+) (?P<container_name>\S+) [0-9]+ - -( (?P<body>.*))?'
timestamp:
parse_from: attributes.timestamp
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
- type: move
from: attributes["body"]
to: body
- type: remove
field: attributes.timestamp
# please remove names from below if you want to collect logs from them
- type: filter
id: signoz_logs_filter
expr: 'attributes.container_name matches "^signoz-(logspout|frontend|alertmanager|query-service|otel-collector|clickhouse|zookeeper)"'
opencensus:
endpoint: 0.0.0.0:55678
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
tls:
cert_file: /opt/samespace/Cert-mtls/mesh.crt
key_file: /opt/samespace/Cert-mtls/gw.key
ca_file: /opt/samespace/Cert-mtls/ca.crt
http:
endpoint: 0.0.0.0:4318
tls:
cert_file: /opt/samespace/Cert-mtls/mesh.crt
key_file: /opt/samespace/Cert-mtls/gw.key
ca_file: /opt/samespace/Cert-mtls/ca.crt
otlp/mtls:
protocols:
grpc:
endpoint: 0.0.0.0:4317
tls:
cert_file: /opt/samespace/Cert-mtls/mesh.crt
key_file: /opt/samespace/Cert-mtls/gw.key
ca_file: /opt/samespace/Cert-mtls/ca.crt
http:
endpoint: 0.0.0.0:4318
tls:
cert_file: /opt/samespace/Cert-mtls/mesh.crt
key_file: /opt/samespace/Cert-mtls/gw.key
ca_file: /opt/samespace/Cert-mtls/ca.crt
jaeger:
protocols:
grpc:
endpoint: 0.0.0.0:14250
thrift_http:
endpoint: 0.0.0.0:14268
# thrift_compact:
# endpoint: 0.0.0.0:6831
# thrift_binary:
# endpoint: 0.0.0.0:6832
hostmetrics:
collection_interval: 30s
scrapers:
cpu: {}
load: {}
memory: {}
disk: {}
filesystem: {}
network: {}
prometheus:
config:
global:
scrape_interval: 60s
scrape_configs:
# otel-collector internal metrics
- job_name: otel-collector
static_configs:
- targets:
- 10.160.0.41:8888
labels:
job_name: otel-collector
processors:
batch:
send_batch_size: 10000
send_batch_max_size: 11000
timeout: 10s
signozspanmetrics/cumulative:
metrics_exporter: clickhousemetricswrite
metrics_flush_interval: 60s
latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s ]
dimensions_cache_size: 100000
dimensions:
- name: service.namespace
default: default
- name: deployment.environment
default: default
# This is added to ensure the uniqueness of the timeseries
# Otherwise, identical timeseries produced by multiple replicas of
# collectors result in incorrect APM metrics
- name: 'signoz.collector.id'
# memory_limiter:
# # 80% of maximum memory up to 2G
# limit_mib: 1500
# # 25% of limit up to 2G
# spike_limit_mib: 512
# check_interval: 5s
#
# # 50% of the maximum memory
# limit_percentage: 50
# # 20% of max memory usage spike expected
# spike_limit_percentage: 20
# queued_retry:
# num_workers: 4
# queue_size: 100
# retry_on_failure: true
resourcedetection:
# Using OTEL_RESOURCE_ATTRIBUTES envvar, env detector adds custom labels.
detectors: [env, system] # include ec2 for AWS, gcp for GCP and azure for Azure.
timeout: 2s
signozspanmetrics/delta:
metrics_exporter: clickhousemetricswrite
metrics_flush_interval: 60s
latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s ]
dimensions_cache_size: 100000
aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA
enable_exp_histogram: true
dimensions:
- name: service.namespace
default: default
- name: deployment.environment
default: default
# This is added to ensure the uniqueness of the timeseries
# Otherwise, identical timeseries produced by multiple replicas of
# collectors result in incorrect APM metrics
- name: signoz.collector.id
extensions:
health_check:
endpoint: 0.0.0.0:13133
zpages:
endpoint: 0.0.0.0:55679
pprof:
endpoint: 0.0.0.0:1777
exporters:
clickhousetraces:
datasource: <tcp://10.160.0.41:9000/signoz_traces>
docker_multi_node_cluster: ${DOCKER_MULTI_NODE_CLUSTER}
low_cardinal_exception_grouping: ${LOW_CARDINAL_EXCEPTION_GROUPING}
clickhousemetricswrite:
endpoint: <tcp://10.160.0.41:9000/signoz_metrics>
resource_to_telemetry_conversion:
enabled: true
clickhousemetricswrite/prometheus:
endpoint: <tcp://10.160.0.41:9000/signoz_metrics>
clickhouselogsexporter:
dsn: <tcp://10.160.0.41:9000/signoz_logs>
docker_multi_node_cluster: ${DOCKER_MULTI_NODE_CLUSTER}
timeout: 10s
# logging: {}
service:
telemetry:
metrics:
address: 0.0.0.0:8888
extensions:
- health_check
- zpages
- pprof
pipelines:
traces:
receivers: [jaeger, otlp]
processors: [signozspanmetrics/cumulative, signozspanmetrics/delta, batch]
exporters: [clickhousetraces]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [clickhousemetricswrite]
metrics/generic:
receivers: [hostmetrics]
processors: [resourcedetection, batch]
exporters: [clickhousemetricswrite]
metrics/prometheus:
receivers: [prometheus]
processors: [batch]
exporters: [clickhousemetricswrite/prometheus]
logs:
receivers: [otlp, tcplog/docker]
processors: [batch]
exporters: [clickhouselogsexporter]
ENV
OTEL_RESOURCE_ATTRIBUTES=host.name=signoz-host,os.type=linux
DOCKER_MULTI_NODE_CLUSTER=false
LOW_CARDINAL_EXCEPTION_GROUPING=false
ClickHouseUrl=<tcp://10.160.0.41:9000>
ALERTMANAGER_API_PREFIX=<http://10.160.0.41:9093/api/>
SIGNOZ_LOCAL_DB_PATH=/var/lib/signoz/signoz.db
DASHBOARDS_PATH=/root/config/dashboards
STORAGE=clickhouse
GODEBUG=netdns=go
TELEMETRY_ENABLED=true
DEPLOYMENT_TYPE=docker-standalone-amd
server_endpoint: <ws://10.160.0.41:4320/v1/opamp>
The error i'm getting is : OTEL Logs
{"level":"error","timestamp":"2024-06-07T06:22:58.034Z","caller":"opamp/server_client.go:216","msg":"failed to apply config","component":"opamp-server-client","error":"failed to reload config: /var/tmp/collector-config.yaml: collector failed to restart: failed to build pipelines: failed to create \"clickhouselogsexporter\" exporter for data type \"logs\": cannot configure clickhouse logs exporter: code: 81, message: Database signoz_logs does not exist","stacktrace":"<http://github.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).onRemoteConfigHandler|github.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).onRemoteConfigHandler>\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/server_client.go:216\ngithub.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).onMessageFuncHandler\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/server_client.go:199\ngithub.com/open-telemetry/opamp-go/client/types.CallbacksStruct.OnMessage\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/types/callbacks.go:162\ngithub.com/open-telemetry/opamp-go/client/internal.(*receivedProcessor).ProcessReceivedMessage\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/internal/receivedprocessor.go:131\ngithub.com/open-telemetry/opamp-go/client/internal.(*wsReceiver).ReceiverLoop\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/internal/wsreceiver.go:57\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runOneCycle\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:243\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runUntilStopped\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:265\ngithub.com/open-telemetry/opamp-go/client/internal.(*ClientCommon).StartConnectAndRun.func1\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/internal/clientcommon.go:197"}
Help please!
@nitya-signoz