Hi, we are self-hosting SigNoz with Clickhouse replica set to 2 to be fault tolerance. But we met is...
h

Herb He

10 months ago
Hi, we are self-hosting SigNoz with Clickhouse replica set to 2 to be fault tolerance. But we met issue when searching logs we are randomly losing half of the logs which seems like 2 Clickhouse replicas are not sync data between each other. I checked the logs table definition and seems not using the right Replicated engine but just MergeTree. Can someone suggest what we should do for it?
SHOW CREATE TABLE logs

Query id: 0affbaf0-0688-4d18-b7cf-e08c9052139c

┌─statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE TABLE signoz_logs.logs
(
    `timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `observed_timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `id` String CODEC(ZSTD(1)),
    `trace_id` String CODEC(ZSTD(1)),
    `span_id` String CODEC(ZSTD(1)),
    `trace_flags` UInt32,
    `severity_text` LowCardinality(String) CODEC(ZSTD(1)),
    `severity_number` UInt8,
    `body` String CODEC(ZSTD(2)),
    `resources_string_key` Array(String) CODEC(ZSTD(1)),
    `resources_string_value` Array(String) CODEC(ZSTD(1)),
    `attributes_string_key` Array(String) CODEC(ZSTD(1)),
    `attributes_string_value` Array(String) CODEC(ZSTD(1)),
    `attributes_int64_key` Array(String) CODEC(ZSTD(1)),
    `attributes_int64_value` Array(Int64) CODEC(ZSTD(1)),
    `attributes_float64_key` Array(String) CODEC(ZSTD(1)),
    `attributes_float64_value` Array(Float64) CODEC(ZSTD(1)),
    `attributes_bool_key` Array(String) CODEC(ZSTD(1)),
    `attributes_bool_value` Array(Bool) CODEC(ZSTD(1)),
    INDEX body_idx body TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 4,
    INDEX id_minmax id TYPE minmax GRANULARITY 1,
    INDEX severity_number_idx severity_number TYPE set(25) GRANULARITY 4,
    INDEX severity_text_idx severity_text TYPE set(25) GRANULARITY 4,
    INDEX trace_flags_idx trace_flags TYPE bloom_filter GRANULARITY 4
)
ENGINE = MergeTree
PARTITION BY toDate(timestamp / 1000000000)
ORDER BY (timestamp, id)
TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(432000)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1 │
Hey team! :wave: Need some help with SigNoz and Local OTel collector setup. Current situation: - I ...
d

Dhairya Patel

4 months ago
Hey team! 👋 Need some help with SigNoz and Local OTel collector setup. Current situation: • I have a local OTel collector running on the server. • On same server I have set up self-hosted SigNoz with Docker. • Currently getting errors and no data flowing into SigNoz Here's my local OTel collector config:
extensions:
  health_check: {}

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  hostmetrics:
    collection_interval: 60s
    scrapers:
      cpu: {}
      disk: {}
      load: {}
      filesystem: {}
      memory: {}
      network: {}
      process:
        mute_process_name_error: true
        mute_process_exe_error: true
        mute_process_io_error: true

processors:
  batch:
    send_batch_size: 500
    send_batch_max_size: 600
    timeout: 10s
  memory_limiter:
    check_interval: 1s
    limit_percentage: 75
    spike_limit_percentage: 15
  resourcedetection:
    detectors: [env, system]
    timeout: 2s

exporters:
  otlp:
    endpoint: 0.0.0.0:9000
    tls:
      insecure: true
  clickhouse:
    endpoint: <tcp://localhost:9000>
    database: signoz_traces
    username: default
    password: default123
    timeout: 10s

service:
  telemetry:
    metrics:
      address: 0.0.0.0:8888
    logs:
      level: info

  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp, clickhouse]
    metrics:
      receivers: [otlp, hostmetrics]
      processors: [memory_limiter, resourcedetection, batch]
      exporters: [otlp, clickhouse]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp, clickhouse]
Main questions: 1. What's the correct data flow path? Should it be: Option A: App → Local OTel collector → SigNoz OR Option B: App → Local OTel collector → SigNoz's Docker OTel collector → SigNoz 2. If Option A is correct, what changes do I need in my local OTel collector config to send data directly to SigNoz's ClickHouse? 3. If Option B is recommended, how should I configure the collectors to avoid port conflicts and ensure proper data flow? Current errors I'm seeing:
Exporting failed. Will retry the request after interval.        {"kind": "exporter", "data_type": "metrics", "name": "otlp", "error": "rpc error: code = Unavailable desc = connection error: desc = \"error reading server preface: http2: frame too large\"", "interval": "3.79851812s"}
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419808]: panic: runtime error: invalid memory address or nil pointer dereference
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419808]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x625b9613ca3c]
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419808]: goroutine 185 [running]:
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419808]: go.opentelemetry.io/collector/pdata/pcommon.Value.Type(...)
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419808]:         go.opentelemetry.io/collector/pdata@v1.22.0/pcommon/value.go:183
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419808]: go.opentelemetry.io/collector/pdata/pcommon.Value.AsString({0x0?, 0xc002684380?})
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419808]:         go.opentelemetry.io/collector/pdata@v1.22.0/pcommon/value.go:370 +0x1c
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419808]: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhouseexporter/internal.(*sumMetrics).insert.func1(0x0?)
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419808]:         github.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhouseexporter@v0.116.0/internal/sum_metrics.go:129 +0x365
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419808]: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhouseexporter/internal.doWithTx({0x625ba27ea808?, 0xc0006cb8f0?}, 0x0?, 0xc0017baf40)
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419808]:         github.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhouseexporter@v0.116.0/internal/metrics_model.go:210 +0xcd
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419808]: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhouseexporter/internal.(*sumMetrics).insert(0xc0014a3b60, {0x625ba27ea808, 0xc0006cb8f0}, 0xc001655520)
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419808]:         github.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhouseexporter@v0.116.0/internal/sum_metrics.go:105 +0xc9
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419808]: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhouseexporter/internal.InsertMetrics.func1({0x625ba2797e98?, 0xc0014a3b60?}, 0xc002684390)
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419808]:         github.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhouseexporter@v0.116.0/internal/metrics_model.go:100 +0x43
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419808]: created by github.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhouseexporter/internal.InsertMetrics in goroutine 193
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419808]:         github.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhouseexporter@v0.116.0/internal/metrics_model.go:99 +0xcc
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC systemd[1]: otelcol-contrib.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC systemd[1]: otelcol-contrib.service: Failed with result 'exit-code'.
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC systemd[1]: otelcol-contrib.service: Scheduled restart job, restart counter is at 2.
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC systemd[1]: Started otelcol-contrib.service - OpenTelemetry Collector.
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419831]: 2025-01-04T15:08:02.290+0530        info        service@v0.116.0/service.go:164        Setting up own telemetry...
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419831]: 2025-01-04T15:08:02.290+0530        warn        service@v0.116.0/service.go:213        service::telemetry::metrics::address is being deprecated in favor of service::telemetry::metrics::readers
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419831]: 2025-01-04T15:08:02.290+0530        info        telemetry/metrics.go:70        Serving metrics        {"address": "0.0.0.0:8888", "metrics level": "Normal"}
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419831]: 2025-01-04T15:08:02.291+0530        info        memorylimiter@v0.116.0/memorylimiter.go:151        Using percentage memory limiter        {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "total_memory_mib": 31459, "limit_percentage": 75, "spike_limit_percentage": 15}
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419831]: 2025-01-04T15:08:02.291+0530        info        memorylimiter@v0.116.0/memorylimiter.go:75        Memory limiter configured        {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "limit_mib": 23594, "spike_limit_mib": 4718, "check_interval": 1}
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419831]: 2025-01-04T15:08:02.293+0530        info        service@v0.116.0/service.go:230        Starting otelcol-contrib...        {"Version": "0.116.0", "NumCPU": 16}
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419831]: 2025-01-04T15:08:02.293+0530        info        extensions/extensions.go:39        Starting extensions...
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419831]: 2025-01-04T15:08:02.295+0530        warn        grpc@v1.68.1/clientconn.go:1384        [core] [Channel #1 SubChannel #2]grpc: addrConn.createTransport failed to connect to {Addr: "0.0.0.0:9000", ServerName: "0.0.0.0:9000", }. Err: connection error: desc = "error reading server preface: http2: frame too large"        {"grpc_log": true}
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419831]: 2025-01-04T15:08:02.309+0530        info        internal/resourcedetection.go:126        began detecting resource information        {"kind": "processor", "name": "resourcedetection", "pipeline": "metrics"}
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419831]: 2025-01-04T15:08:02.310+0530        info        internal/resourcedetection.go:140        detected resource information        {"kind": "processor", "name": "resourcedetection", "pipeline": "metrics", "resource": {"host.name":"abhiyanta-HP-285-Pro-G6-Microtower-PC","os.type":"linux"}}
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419831]: 2025-01-04T15:08:02.310+0530        warn        internal@v0.116.0/warning.go:40        Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks.        {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "<https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks>"}
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419831]: 2025-01-04T15:08:02.310+0530        info        otlpreceiver@v0.116.0/otlp.go:112        Starting GRPC server        {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:4317"}
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419831]: 2025-01-04T15:08:02.310+0530        warn        internal@v0.116.0/warning.go:40        Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks.        {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "<https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks>"}
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419831]: 2025-01-04T15:08:02.310+0530        info        otlpreceiver@v0.116.0/otlp.go:169        Starting HTTP server        {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:4318"}
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419831]: 2025-01-04T15:08:02.316+0530        warn        grpc@v1.68.1/clientconn.go:1384        [core] [Channel #6 SubChannel #7]grpc: addrConn.createTransport failed to connect to {Addr: "0.0.0.0:9000", ServerName: "0.0.0.0:9000", }. Err: connection error: desc = "error reading server preface: http2: frame too large"        {"grpc_log": true}
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419831]: 2025-01-04T15:08:02.317+0530        warn        grpc@v1.68.1/clientconn.go:1384        [core] [Channel #8 SubChannel #10]grpc: addrConn.createTransport failed to connect to {Addr: "0.0.0.0:9000", ServerName: "0.0.0.0:9000", }. Err: connection error: desc = "error reading server preface: http2: frame too large"        {"grpc_log": true}
Jan 04 15:08:02 abhiyanta-HP-285-Pro-G6-Microtower-PC otelcol-contrib[419831]: 2025-01-04T15:08:02.323+0530        info        service@v0.116.0/service.go:253        Everything is ready. Begin running and processing data.
Any guidance on the correct approach would be really helpful! 🙏