ok, need some help. 1. with windows server app logs written to a share, what's the preferred way to...
b
ok, need some help. 1. with windows server app logs written to a share, what's the preferred way to get those into signoz? these systems are vmware VMs in our DC, and our signoz systems are in our much larger AWS environment. i was going to simply mount the share (either NFS or SMB) to the signoz system for this POC. 2. following the instructions here (https://signoz.io/docs/userguide/collect_logs_from_file/), i just copied the logs over to the system, and configured the
docker-compose.yaml
and
otel-collector-config.yaml
files to point to a single log file, but i get the following when i view the logs for the otel-collector container:
Copy code
{
  "level": "fatal",
  "timestamp": "2024-03-04T17:29:06.159Z",
  "caller": "signozcollector/main.go:72",
  "msg": "failed to create collector service:",
  "error": "failed to create server client: failed to create collector config: failed to upsert instance id failed to parse config file /var/tmp/collector-config.yaml: yaml: line 166: did not find expected key",
  "stacktrace": "main.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozcollector/main.go:72\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.7/x64/src/runtime/proc.go:267"
}
here is my corresponding config block:
Copy code
logs:
      receivers: [otlp, tcplog/docker, filelog]
      processors: [batch]
      exporters: [clickhouselogsexporter]
        filelog:
          include: [/cloudadmins/logs/OTHER/D202306/M5WEB_PRESENTATION_COMMON_CSILOGON.ASPX_638224444396847090_0.txt]
          start_at: beginning
i've configured and used a number of the other major systems, shipped logs from systems up to hosted and SAAS log aggregation systems, but never had this kind of issue. i dont' think i've missed anything in the docs, but maybe there's something i'm not seeing
i reloaded w/
docker-compose
, it rebuilt the
signoz-otel-collector
but it's still flapping with the same:
Copy code
{
  "level": "fatal",
  "timestamp": "2024-03-04T18:34:51.261Z",
  "caller": "signozcollector/main.go:72",
  "msg": "failed to create collector service:",
  "error": "failed to create server client: failed to create collector config: failed to upsert instance id failed to parse config file /var/tmp/collector-config.yaml: yaml: line 166: did not find expected key",
  "stacktrace": "main.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozcollector/main.go:72\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.7/x64/src/runtime/proc.go:267"
}
couldn't say what's missing. gonna go see if i can find the code on github to dig into the missing key...
p
At a glance, isn't this an yaml formatting error?
Copy code
exporters: [clickhouselogsexporter]
        filelog:
          include:
I believe you may be mixing two parts of the opentelemetry config. You should have a top level exporters key, where you make all your exporter definitions such as file log, and then you should have a pipellines key under which you define which exporters to use for logs.
(I'm not associated with signoz)
It's possible the tutorial you linked was a bit confusing since there's two of the same key. Take a glance at the first code block under https://opentelemetry.io/docs/collector/configuration/#basics to get an idea of what a config file typically looks like
s
Please share your full config.
b
yeah, i think i was. i ended up revising the config but still nothing. i even ended up also just using some very basic logs from a linux process monitor script and it at least runs, but, there's no evidence of the logs in signoz. i'll share full config in a moment.
otel-collector-config.yaml
Copy code
receivers:
  tcplog/docker:
    listen_address: "0.0.0.0:2255"
    operators:
      - type: regex_parser
        regex: '^<([0-9]+)>[0-9]+ (?P<timestamp>[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?) (?P<container_id>\S+) (?P<container_name>\S+) [0-9]+ - -( (?P<body>.*))?'
        timestamp:
          parse_from: attributes.timestamp
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'
      - type: move
        from: attributes["body"]
        to: body
      - type: remove
        field: attributes.timestamp
        # please remove names from below if you want to collect logs from them
      - type: filter
        id: signoz_logs_filter
        expr: 'attributes.container_name matches "^signoz-(logspout|frontend|alertmanager|query-service|otel-collector|clickhouse|zookeeper)"'
  opencensus:
    endpoint: 0.0.0.0:55678
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268
      # thrift_compact:
      #   endpoint: 0.0.0.0:6831
      # thrift_binary:
      #   endpoint: 0.0.0.0:6832
  hostmetrics:
    collection_interval: 30s
    scrapers:
      cpu: {}
      load: {}
      memory: {}
      disk: {}
      filesystem: {}
      network: {}
  prometheus:
    config:
      global:
        scrape_interval: 60s
      scrape_configs:
        # otel-collector internal metrics
        - job_name: otel-collector
          static_configs:
          - targets:
              - localhost:8888
            labels:
              job_name: otel-collector
  filelog/app:
    include: ["/cloudadmins/logs/20240304_process_monitor.log"]
    start_at: beginning

processors:
  batch:
    send_batch_size: 10000
    send_batch_max_size: 11000
    timeout: 10s
  signozspanmetrics/cumulative:
    metrics_exporter: clickhousemetricswrite
    metrics_flush_interval: 60s
    latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s ]
    dimensions_cache_size: 100000
    dimensions:
      - name: service.namespace
        default: default
      - name: deployment.environment
        default: default
      # This is added to ensure the uniqueness of the timeseries
      # Otherwise, identical timeseries produced by multiple replicas of
      # collectors result in incorrect APM metrics
      - name: 'signoz.collector.id'
  # memory_limiter:
  #   # 80% of maximum memory up to 2G
  #   limit_mib: 1500
  #   # 25% of limit up to 2G
  #   spike_limit_mib: 512
  #   check_interval: 5s
  #
  #   # 50% of the maximum memory
  #   limit_percentage: 50
  #   # 20% of max memory usage spike expected
  #   spike_limit_percentage: 20
  # queued_retry:
  #   num_workers: 4
  #   queue_size: 100
  #   retry_on_failure: true
  resourcedetection:
    # Using OTEL_RESOURCE_ATTRIBUTES envvar, env detector adds custom labels.
    detectors: [env, system] # include ec2 for AWS, gcp for GCP and azure for Azure.
    timeout: 2s
  signozspanmetrics/delta:
    metrics_exporter: clickhousemetricswrite
    metrics_flush_interval: 60s
    latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s ]
    dimensions_cache_size: 100000
    aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA
    enable_exp_histogram: true
    dimensions:
      - name: service.namespace
        default: default
      - name: deployment.environment
        default: default
      # This is added to ensure the uniqueness of the timeseries
      # Otherwise, identical timeseries produced by multiple replicas of
      # collectors result in incorrect APM metrics
      - name: signoz.collector.id

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  zpages:
    endpoint: 0.0.0.0:55679
  pprof:
    endpoint: 0.0.0.0:1777

exporters:
  clickhousetraces:
    datasource: <tcp://clickhouse:9000/?database=signoz_traces>
    docker_multi_node_cluster: ${DOCKER_MULTI_NODE_CLUSTER}
    low_cardinal_exception_grouping: ${LOW_CARDINAL_EXCEPTION_GROUPING}
  clickhousemetricswrite:
    endpoint: <tcp://clickhouse:9000/?database=signoz_metrics>
    resource_to_telemetry_conversion:
      enabled: true
  clickhousemetricswrite/prometheus:
    endpoint: <tcp://clickhouse:9000/?database=signoz_metrics>
  # logging: {}

  clickhouselogsexporter:
    dsn: <tcp://clickhouse:9000/>
    docker_multi_node_cluster: ${DOCKER_MULTI_NODE_CLUSTER}
    timeout: 10s

service:
  telemetry:
    metrics:
      address: 0.0.0.0:8888
  extensions:
    - health_check
    - zpages
    - pprof
  pipelines:
    traces:
      receivers: [jaeger, otlp]
      processors: [signozspanmetrics/cumulative, signozspanmetrics/delta, batch]
      exporters: [clickhousetraces]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [clickhousemetricswrite]
    metrics/generic:
      receivers: [hostmetrics]
      processors: [resourcedetection, batch]
      exporters: [clickhousemetricswrite]
    metrics/prometheus:
      receivers: [prometheus]
      processors: [batch]
      exporters: [clickhousemetricswrite/prometheus]
    logs:
      receivers: [otlp, filelog/app, tcplog/docker]
      processors: [batch]
      exporters: [clickhouselogsexporter]
      # exporters: [otlp]
docker-compose.yaml
Copy code
version: "2.4"

x-clickhouse-defaults: &clickhouse-defaults
  restart: on-failure
  # addding non LTS version due to this fix <https://github.com/ClickHouse/ClickHouse/commit/32caf8716352f45c1b617274c7508c86b7d1afab>
  image: clickhouse/clickhouse-server:24.1.2-alpine
  tty: true
  depends_on:
    - zookeeper-1
    # - zookeeper-2
    # - zookeeper-3
  logging:
    options:
      max-size: 50m
      max-file: "3"
  healthcheck:
    # "clickhouse", "client", "-u ${CLICKHOUSE_USER}", "--password ${CLICKHOUSE_PASSWORD}", "-q 'SELECT 1'"
    test:
      [
        "CMD",
        "wget",
        "--spider",
        "-q",
        "localhost:8123/ping"
      ]
    interval: 30s
    timeout: 5s
    retries: 3
  ulimits:
    nproc: 65535
    nofile:
      soft: 262144
      hard: 262144

x-db-depend: &db-depend
  depends_on:
    clickhouse:
      condition: service_healthy
    otel-collector-migrator:
      condition: service_completed_successfully
    # clickhouse-2:
    #   condition: service_healthy
    # clickhouse-3:
    #   condition: service_healthy

services:

  zookeeper-1:
    image: bitnami/zookeeper:3.7.1
    container_name: signoz-zookeeper-1
    hostname: zookeeper-1
    user: root
    ports:
      - "2181:2181"
      - "2888:2888"
      - "3888:3888"
    volumes:
      - ./data/zookeeper-1:/bitnami/zookeeper
    environment:
      - ZOO_SERVER_ID=1
      # - ZOO_SERVERS=0.0.0.0:2888:3888,zookeeper-2:2888:3888,zookeeper-3:2888:3888
      - ALLOW_ANONYMOUS_LOGIN=yes
      - ZOO_AUTOPURGE_INTERVAL=1

  # zookeeper-2:
  #   image: bitnami/zookeeper:3.7.0
  #   container_name: signoz-zookeeper-2
  #   hostname: zookeeper-2
  #   user: root
  #   ports:
  #     - "2182:2181"
  #     - "2889:2888"
  #     - "3889:3888"
  #   volumes:
  #     - ./data/zookeeper-2:/bitnami/zookeeper
  #   environment:
  #     - ZOO_SERVER_ID=2
  #     - ZOO_SERVERS=zookeeper-1:2888:3888,0.0.0.0:2888:3888,zookeeper-3:2888:3888
  #     - ALLOW_ANONYMOUS_LOGIN=yes
  #     - ZOO_AUTOPURGE_INTERVAL=1

  # zookeeper-3:
  #   image: bitnami/zookeeper:3.7.0
  #   container_name: signoz-zookeeper-3
  #   hostname: zookeeper-3
  #   user: root
  #   ports:
  #     - "2183:2181"
  #     - "2890:2888"
  #     - "3890:3888"
  #   volumes:
  #     - ./data/zookeeper-3:/bitnami/zookeeper
  #   environment:
  #     - ZOO_SERVER_ID=3
  #     - ZOO_SERVERS=zookeeper-1:2888:3888,zookeeper-2:2888:3888,0.0.0.0:2888:3888
  #     - ALLOW_ANONYMOUS_LOGIN=yes
  #     - ZOO_AUTOPURGE_INTERVAL=1

  clickhouse:
    <<: *clickhouse-defaults
    container_name: signoz-clickhouse
    hostname: clickhouse
    ports:
      - "9000:9000"
      - "8123:8123"
      - "9181:9181"
    volumes:
      - ./clickhouse-config.xml:/etc/clickhouse-server/config.xml
      - ./clickhouse-users.xml:/etc/clickhouse-server/users.xml
      - ./custom-function.xml:/etc/clickhouse-server/custom-function.xml
      - ./clickhouse-cluster.xml:/etc/clickhouse-server/config.d/cluster.xml
      # - ./clickhouse-storage.xml:/etc/clickhouse-server/config.d/storage.xml
      - ./data/clickhouse/:/var/lib/clickhouse/
      - ./user_scripts:/var/lib/clickhouse/user_scripts/

  # clickhouse-2:
  #   <<: *clickhouse-defaults
  #   container_name: signoz-clickhouse-2
  #   hostname: clickhouse-2
  #   ports:
  #     - "9001:9000"
  #     - "8124:8123"
  #     - "9182:9181"
  #   volumes:
  #     - ./clickhouse-config.xml:/etc/clickhouse-server/config.xml
  #     - ./clickhouse-users.xml:/etc/clickhouse-server/users.xml
  #     - ./custom-function.xml:/etc/clickhouse-server/custom-function.xml
  #     - ./clickhouse-cluster.xml:/etc/clickhouse-server/config.d/cluster.xml
  #     # - ./clickhouse-storage.xml:/etc/clickhouse-server/config.d/storage.xml
  #     - ./data/clickhouse-2/:/var/lib/clickhouse/
  #     - ./user_scripts:/var/lib/clickhouse/user_scripts/


  # clickhouse-3:
  #   <<: *clickhouse-defaults
  #   container_name: signoz-clickhouse-3
  #   hostname: clickhouse-3
  #   ports:
  #     - "9002:9000"
  #     - "8125:8123"
  #     - "9183:9181"
  #   volumes:
  #     - ./clickhouse-config.xml:/etc/clickhouse-server/config.xml
  #     - ./clickhouse-users.xml:/etc/clickhouse-server/users.xml
  #     - ./custom-function.xml:/etc/clickhouse-server/custom-function.xml
  #     - ./clickhouse-cluster.xml:/etc/clickhouse-server/config.d/cluster.xml
  #     # - ./clickhouse-storage.xml:/etc/clickhouse-server/config.d/storage.xml
  #     - ./data/clickhouse-3/:/var/lib/clickhouse/
  #     - ./user_scripts:/var/lib/clickhouse/user_scripts/

  alertmanager:
    image: signoz/alertmanager:${ALERTMANAGER_TAG:-0.23.4}
    container_name: signoz-alertmanager
    volumes:
      - ./data/alertmanager:/data
    depends_on:
      query-service:
        condition: service_healthy
    restart: on-failure
    command:
      - --queryService.url=<http://query-service:8085>
      - --storage.path=/data

  # Notes for Maintainers/Contributors who will change Line Numbers of Frontend & Query-Section. Please Update Line Numbers in `./scripts/commentLinesForSetup.sh` & `./CONTRIBUTING.md`

  query-service:
    image: signoz/query-service:${DOCKER_TAG:-0.40.0}
    container_name: signoz-query-service
    command:
      [
        "-config=/root/config/prometheus.yml",
        # "--prefer-delta=true"
      ]
    # ports:
    #   - "6060:6060"     # pprof port
    #   - "8080:8080"     # query-service port
    volumes:
      - ./prometheus.yml:/root/config/prometheus.yml
      - ../dashboards:/root/config/dashboards
      - ./data/signoz/:/var/lib/signoz/
    environment:
      - ClickHouseUrl=<tcp://clickhouse:9000>
      - ALERTMANAGER_API_PREFIX=<http://alertmanager:9093/api/>
      - SIGNOZ_LOCAL_DB_PATH=/var/lib/signoz/signoz.db
      - DASHBOARDS_PATH=/root/config/dashboards
      - STORAGE=clickhouse
      - GODEBUG=netdns=go
      - TELEMETRY_ENABLED=true
      - DEPLOYMENT_TYPE=docker-standalone-amd
    restart: on-failure
    healthcheck:
      test:
        [
          "CMD",
          "wget",
          "--spider",
          "-q",
          "localhost:8080/api/v1/health"
        ]
      interval: 30s
      timeout: 5s
      retries: 3
    <<: *db-depend

  frontend:
    image: signoz/frontend:${DOCKER_TAG:-0.40.0}
    container_name: signoz-frontend
    restart: on-failure
    depends_on:
      - alertmanager
      - query-service
    ports:
      - "3301:3301"
    volumes:
      - ../common/nginx-config.conf:/etc/nginx/conf.d/default.conf

  otel-collector-migrator:
    image: signoz/signoz-schema-migrator:${OTELCOL_TAG:-0.88.14}
    container_name: otel-migrator
    command:
      - "--dsn=<tcp://clickhouse:9000>"
    depends_on:
      clickhouse:
        condition: service_healthy
      # clickhouse-2:
      #   condition: service_healthy
      # clickhouse-3:
      #   condition: service_healthy


  otel-collector:
    image: signoz/signoz-otel-collector:${OTELCOL_TAG:-0.88.14}
    container_name: signoz-otel-collector
    command:
      [
        "--config=/etc/otel-collector-config.yaml",
        "--manager-config=/etc/manager-config.yaml",
        "--copy-path=/var/tmp/collector-config.yaml",
        "--feature-gates=-pkg.translator.prometheus.NormalizeName"
      ]
    user: root # required for reading docker container logs
    volumes:
      # added /cloudadmins/logs/**/*.txt
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
      - ./otel-collector-opamp-config.yaml:/etc/manager-config.yaml
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - "/cloudadmins/logs/20240304_process_monitor.log"
    environment:
      - OTEL_RESOURCE_ATTRIBUTES=host.name=signoz-host,os.type=linux
      - DOCKER_MULTI_NODE_CLUSTER=false
      - LOW_CARDINAL_EXCEPTION_GROUPING=false
    ports:
      # - "1777:1777"     # pprof extension
      - "4317:4317" # OTLP gRPC receiver
      - "4318:4318" # OTLP HTTP receiver
      # - "8888:8888"     # OtelCollector internal metrics
      # - "8889:8889"     # signoz spanmetrics exposed by the agent
      # - "9411:9411"     # Zipkin port
      # - "13133:13133"   # health check extension
      # - "14250:14250"   # Jaeger gRPC
      # - "14268:14268"   # Jaeger thrift HTTP
      # - "55678:55678"   # OpenCensus receiver
      # - "55679:55679"   # zPages extension
    restart: on-failure
    depends_on:
      clickhouse:
        condition: service_healthy
      otel-collector-migrator:
        condition: service_completed_successfully
      query-service:
        condition: service_healthy

  logspout:
    image: "gliderlabs/logspout:v3.2.14"
    container_name: signoz-logspout
    volumes:
      - /etc/hostname:/etc/host_hostname:ro
      - /var/run/docker.sock:/var/run/docker.sock
    command: <syslog+tcp://otel-collector:2255>
    depends_on:
      - otel-collector
    restart: on-failure

  hotrod:
    image: jaegertracing/example-hotrod:1.30
    container_name: hotrod
    logging:
      options:
        max-size: 50m
        max-file: "3"
    command: [ "all" ]
    environment:
      - JAEGER_ENDPOINT=<http://otel-collector:14268/api/traces>

  load-hotrod:
    image: "signoz/locust:1.2.3"
    container_name: load-hotrod
    hostname: load-hotrod
    environment:
      ATTACKED_HOST: <http://hotrod:8080>
      LOCUST_MODE: standalone
      NO_PROXY: standalone
      TASK_DELAY_FROM: 5
      TASK_DELAY_TO: 30
      QUIET_MODE: "${QUIET_MODE:-false}"
      LOCUST_OPTS: "--headless -u 10 -r 1"
    volumes:
      - ../common/locust-scripts:/locust
heres an example of the log it's supposed to be ingesting. but i can't find evidence of it in signoz. (the actual app logs i care about never got ingested)
Copy code
{"timestamp":"2024-03-04T17:10:14-0600","source":"process_monitor","log_level":"INFO","action":"startup","message":"tomlq version tomlq 3.2.3 found.","hostname":"WHWH1WD9F2","host_ip":""}
{"timestamp":"2024-03-04T17:10:14-0600","source":"process_monitor","log_level":"INFO","action":"startup","message":"jq version jq-1.7.1 found.","hostname":"WHWH1WD9F2","host_ip":""}
{"timestamp":"2024-03-04T17:10:15-0600","source":"process_monitor","log_level":"INFO","action":"startup","message":"Created new log file /Users/brandon/bin/bash_process_monitor/log/20240304_process_monitor.log","hostname":"WHWH1WD9F2","host_ip":""}
{"timestamp":"2024-03-04T17:10:15-0600","source":"process_monitor","log_level":"WARN","action":"process_check","message":"Process found (6 actual of 1 expected) with pattern kitty with PIDs 21069,22746,78381,78889,78953,81667.","pid":"21069,22746,78381,78889,78953,81667","process_count":"6","expected_count":"1","pattern":"kitty","status":"WARN","n_status":"1","hostname":"WHWH1WD9F2","host_ip":""}
{"timestamp":"2024-03-04T17:11:13-0600","source":"process_monitor","log_level":"INFO","action":"startup","message":"tomlq version tomlq 3.2.3 found.","hostname":"WHWH1WD9F2","host_ip":"127.0.0.1"}
{"timestamp":"2024-03-04T17:11:13-0600","source":"process_monitor","log_level":"INFO","action":"startup","message":"jq version jq-1.7.1 found.","hostname":"WHWH1WD9F2","host_ip":"127.0.0.1"}
{"timestamp":"2024-03-04T17:11:13-0600","source":"process_monitor","log_level":"INFO","action":"startup","message":"Using existing log file /Users/brandon/bin/bash_process_monitor/log/20240304_process_monitor.log","hostname":"WHWH1WD9F2","host_ip":"127.0.0.1"}
{"timestamp":"2024-03-04T17:11:14-0600","source":"process_monitor","log_level":"WARN","action":"process_check","message":"Process found (6 actual of 1 expected) with pattern kitty with PIDs 21069,22746,78381,78889,78953,81667.","pid":"21069,22746,78381,78889,78953,81667","process_count":"6","expected_count":"1","pattern":"kitty","status":"WARN","n_status":"1","hostname":"WHWH1WD9F2","host_ip":"127.0.0.1"}
s
Do you see any error logs in signoz-otel-collector container?
b
looking
these logs don't end up in signoz?
after my last reconfig i only see
info
and
warn
entries when doing a
docker logs
but pulling into IDE so i can more easily look thru the entries.
hm. i do see a
warn
about no matching files:
Copy code
2024-03-05T17: 05: 31.895Z	warn	fileconsumer/file.go: 61	finding files: no files match the configured criteria	
  {
    "kind": "receiver",
    "name": "filelog/app",
    "data_type": "logs",
    "component": "fileconsumer"
  }
the file path is definitely right tho. maybe permissions, lemme check
s
Right, this was the warning/error I was expecting. If you are not seeing the logs in SigNoz then most likely they are not getting ingested in the first place. Please make sure the path is correct.
b
yeah - got it working. the lines about mounting the log files as volumes read kinda ambiguously i guess. just mounted the folders in the compose yaml and the file in the collector config. will play with options there, cuz we'll have a lot of wildcard type reading we'll need to do...
thank you
283 Views