I'm having a hard time trying to figure out how to...
# support
h
I'm having a hard time trying to figure out how to debug an issue with logs. I'm trying to convert our logging output to JSON and then using a
json_parser
in the otel-collector to convert it into attributes. An example from the log file is:
Copy code
{
  "severity_text": "INFO",
  "time": "2023-07-21T12:46:34.541+02:00",
  "message": "Completed 200 OK in 49ms (Views: 40.5ms | ActiveRecord: 5.0ms | Allocations: 124929)\n\n",
  "operation": "Admin::DepartmentsController#index",
  "span_id": "628ed9fd4fe1ecb0",
  "trace_id": "67b2a6b62c7f983c3801d14923c017ac"
}
In SigNoz it shows the body as
Copy code
{
  "severity_text": "INFO",
  "time": "2023-07-21T12:46:34.541+02:00",
  "message": "Completed 200 OK in 49ms (Views: 40.5ms | ActiveRecord: 5.0ms | Allocations: 124929)",
  "operation": "Admin::DepartmentsController#index",
  "span_id": "628ed9fd4fe1ecb0",
  "trace_id": "67b2a6b62c7f983c3801d14923c017ac"
}
but none of the attributes is parsed out:
I have the following processors defined:
Copy code
processors:
  logstransform/internal:
    operators:
      - type: json_parser
      - type: trace_parser
        if: '"trace_id" in attributes or "span_id" in attributes'
        trace_id:
          parse_from: attributes.trace_id
        span_id:
          parse_from: attributes.span_id
        output: remove_trace_id
      - type: trace_parser
        if: '"traceId" in attributes or "spanId" in attributes'
        trace_id:
          parse_from: attributes.traceId
        span_id:
          parse_from: attributes.spanId
        output: remove_traceId
      - id: remove_traceId
        type: remove
        if: '"traceId" in attributes'
        field: attributes.traceId
        output: remove_spanId
      - id: remove_spanId
        type: remove
        if: '"spanId" in attributes'
        field: attributes.spanId
      - id: remove_trace_id
        type: remove
        if: '"trace_id" in attributes'
        field: attributes.trace_id
        output: remove_span_id
      - id: remove_span_id
        type: remove
        if: '"span_id" in attributes'
        field: attributes.span_id
I'd love to know how I can debug this to understand if something is failing with the parsing
a
@nitya-signoz please check this out
n
Are you using the processor in your logs pipeline? Can you share the entire otel-collector config ?
h
sure
Copy code
receivers:
  syslog:
    tcp:
      listen_address: "0.0.0.0:54527"
    protocol: rfc3164
    location: UTC
  filelog/dockercontainers:
    include: [  "/var/lib/docker/containers/*/*.log" ]
    start_at: end
    include_file_path: true
    include_file_name: false
    operators:
    - type: json_parser
      id: parser-docker
      output: extract_metadata_from_filepath
      timestamp:
        parse_from: attributes.time
        layout: '%Y-%m-%dT%H:%M:%S.%LZ'
    - type: regex_parser
      id: extract_metadata_from_filepath
      regex: '^.*containers/(?P<container_id>[^_]+)/.*log$'
      parse_from: attributes["log.file.path"]
      output: parse_body
    - type: move
      id: parse_body
      from: attributes.log
      to: body
      output: time
    - type: remove
      id: time
      field: attributes.time
  opencensus:
    endpoint: 0.0.0.0:55678
  otlp/spanmetrics:
    protocols:
      grpc:
        endpoint: localhost:12345
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268
      # thrift_compact:
      #   endpoint: 0.0.0.0:6831
      # thrift_binary:
      #   endpoint: 0.0.0.0:6832
  hostmetrics:
    collection_interval: 30s
    scrapers:
      cpu: {}
      load: {}
      memory: {}
      disk: {}
      filesystem: {}
      network: {}
  prometheus:
    config:
      global:
        scrape_interval: 60s
      scrape_configs:
        # otel-collector internal metrics
        - job_name: otel-collector
          static_configs:
          - targets:
              - localhost:8888
            labels:
              job_name: otel-collector


processors:
  logstransform/internal:
    operators:
      - type: json_parser
      - type: trace_parser
        if: '"trace_id" in attributes or "span_id" in attributes'
        trace_id:
          parse_from: attributes.trace_id
        span_id:
          parse_from: attributes.span_id
        output: remove_trace_id
      - type: trace_parser
        if: '"traceId" in attributes or "spanId" in attributes'
        trace_id:
          parse_from: attributes.traceId
        span_id:
          parse_from: attributes.spanId
        output: remove_traceId
      - id: remove_traceId
        type: remove
        if: '"traceId" in attributes'
        field: attributes.traceId
        output: remove_spanId
      - id: remove_spanId
        type: remove
        if: '"spanId" in attributes'
        field: attributes.spanId
      - id: remove_trace_id
        type: remove
        if: '"trace_id" in attributes'
        field: attributes.trace_id
        output: remove_span_id
      - id: remove_span_id
        type: remove
        if: '"span_id" in attributes'
        field: attributes.span_id
  batch:
    send_batch_size: 10000
    send_batch_max_size: 11000
    timeout: 10s
  signozspanmetrics/prometheus:
    metrics_exporter: prometheus
    latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s ]
    dimensions_cache_size: 100000
    dimensions:
      - name: service.namespace
        default: default
      - name: deployment.environment
        default: default
      # This is added to ensure the uniqueness of the timeseries
      # Otherwise, identical timeseries produced by multiple replicas of
      # collectors result in incorrect APM metrics
      - name: 'signoz.collector.id'
  # memory_limiter:
  #   # 80% of maximum memory up to 2G
  #   limit_mib: 1500
  #   # 25% of limit up to 2G
  #   spike_limit_mib: 512
  #   check_interval: 5s
  #
  #   # 50% of the maximum memory
  #   limit_percentage: 50
  #   # 20% of max memory usage spike expected
  #   spike_limit_percentage: 20
  # queued_retry:
  #   num_workers: 4
  #   queue_size: 100
  #   retry_on_failure: true
  resourcedetection:
    # Using OTEL_RESOURCE_ATTRIBUTES envvar, env detector adds custom labels.
    detectors: [env, system] # include ec2 for AWS, gcp for GCP and azure for Azure.
    timeout: 2s

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  zpages:
    endpoint: 0.0.0.0:55679
  pprof:
    endpoint: 0.0.0.0:1777

exporters:
  clickhousetraces:
    datasource: <tcp://clickhouse:9000/?database=signoz_traces>
    docker_multi_node_cluster: ${DOCKER_MULTI_NODE_CLUSTER}
    low_cardinal_exception_grouping: ${LOW_CARDINAL_EXCEPTION_GROUPING}
  clickhousemetricswrite:
    endpoint: <tcp://clickhouse:9000/?database=signoz_metrics>
    resource_to_telemetry_conversion:
      enabled: true
  clickhousemetricswrite/prometheus:
    endpoint: <tcp://clickhouse:9000/?database=signoz_metrics>
  prometheus:
    endpoint: 0.0.0.0:8889
  # logging: {}

  clickhouselogsexporter:
    dsn: <tcp://clickhouse:9000/>
    docker_multi_node_cluster: ${DOCKER_MULTI_NODE_CLUSTER}
    timeout: 5s
    sending_queue:
      queue_size: 100
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s

service:
  telemetry:
    metrics:
      address: 0.0.0.0:8888
  extensions:
    - health_check
    - zpages
    - pprof
  pipelines:
    traces:
      receivers: [jaeger, otlp]
      processors: [signozspanmetrics/prometheus, batch]
      exporters: [clickhousetraces]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [clickhousemetricswrite]
    metrics/generic:
      receivers: [hostmetrics]
      processors: [resourcedetection, batch]
      exporters: [clickhousemetricswrite]
    metrics/prometheus:
      receivers: [prometheus]
      processors: [batch]
      exporters: [clickhousemetricswrite/prometheus]
    metrics/spanmetrics:
      receivers: [otlp/spanmetrics]
      exporters: [prometheus]
    logs:
      receivers: [otlp, syslog]
      processors: [logstransform/internal, batch]
      exporters: [clickhouselogsexporter]
I don't think that's the whole file
it is, my bad 🙂
n
If I am not wrong, you are ingesting these logs through the syslog receiver right ?
h
yeah, I'm happy to change that though. I was just trying to find the simplest way to make it work. I have a rails app which is easy to log to the system log folders and rsyslog is already available to use
it just felt like the easiest way 🙂 but I'm happy to do it differently
n
Hi please add this processor
Copy code
logstransform/json_parser:
    operators:
      - type: json_parser
        parse_from: body
        parse_to: attributes
        if: "body matches '^{.*}$'"
add add it to the pipelines
Copy code
processors: [logstransform/json_parser, logstransform/internal, batch]
The execution of internal parser is chained so the json parser was not executed
h
I get a different result now, still not quite what I would've hoped for, but it might just be progress
image.png
Why do things like message/operation now going into the
attributes_string
key ?
n
Trace_id should be parsed.
Is
logstransform/internal
still present in the pipeline ?
h
Copy code
processors:
+  logstransform/json_parser:
+    operators:
+      - type: json_parser
+        parse_from: body
+        parse_to: attributes
+        if: "body matches '^{.*}$'"
   logstransform/internal:
     operators:
+      - type: json_parser
       - type: trace_parser
         if: '"trace_id" in attributes or "span_id" in attributes'
         trace_id:
@@ -205,6 +217,6 @@ service:
       receivers: [otlp/spanmetrics]
       exporters: [prometheus]
     logs:
-      receivers: [otlp, filelog/dockercontainers]
-      processors: [logstransform/internal, batch]
-      exporters: [clickhouselogsexporter]
\ No newline at end of file
+      receivers: [otlp, syslog]
+      processors: [logstransform/json_parser, logstransform/internal, batch]
+      exporters: [clickhouselogsexporter]
that's the diff atm
n
Why do things like message/operation now going into the
attributes_string
key ?
It just mentions that it’s an attribute. You can query it directly by the key.
h
kewl kewl
n
The sample log you gave me, everything is getting parsed as expected. Can you check if you see any kind of error in the otel-collector logs ?
Able to reproduce it, give me few mins will get back
Please update both of your processors to this
Copy code
logstransform/json_parser:
    operators:
      - type: json_parser
        parse_from: body
        parse_to: attributes
        if: "body matches '^{.*}$'"
      - type: severity_parser
        if: "'severity_text' in attributes"
        parse_from: attributes.severity_text
        output: remove_severity_text
      - type: remove
        id: remove_severity_text
        if: "'severity_text' in attributes"
        field: attributes.severity_text
  logstransform/internal:
    operators:
      - type: trace_parser
        if: '"trace_id" in attributes or "span_id" in attributes'
        trace_id:
          parse_from: attributes.trace_id
        span_id:
          parse_from: attributes.span_id
        output: remove_trace_id
      - id: remove_trace_id
        type: remove
        if: '"trace_id" in attributes'
        field: attributes.trace_id
        output: remove_span_id
      - id: remove_span_id
        type: remove
        if: '"span_id" in attributes'
        field: attributes.span_id
And let me know if it works
h
{"log":"2023-07-21T123612.142Z\u0009error\u0009logstransformprocessor@v0.79.0/processor.go:144\u0009processor encountered an issue with the pipeline\u0009{\"kind\": \"processor\", \"name\": \"logstransform/json_parser\", \"pipeline\": \"logs\", \"error\": \"Unmarshal: there are bytes left after unmarshal, error found in #10 byte of ...|a532fc0\\\"}{\\\"severity_|..., bigger context ...|5\\\",\\\"trace_id\\\":\\\"c55dbf84015b06af9aae5d2faa532fc0\\\"}{\\\"severity_text\\\":\\\"DEBUG\\\",\\\"time\\\":\\\"2023-07-21T12014|...\"}\n","stream":"stderr","time":"2023-07-21T123612.143530708Z"}
found that
will update them now
now I get trace ids and span ids 🙂
n
This error is different, there are some logs which the json parser is failing to parse. It will by ingested directly without parsing we need to find them and then check what’s wrong with the format.
Great.
h
kewl kewl
image.png
Thank you very much for your help @nitya-signoz
n
Awesome
@Hannes Benson are you using our new explorer page or the old one ?
h
it's still the old one 🙂 I'll try the new one out. Now that it's working
n
Sure. Since some duplicate attributes might have been created please run this SQL’s on clickhouse so that you don’t face any issue.
Copy code
docker exec -it clickhouse /bin/bash

clickhouse client

truncate table logs_attribute_keys;

truncate table tag_attributes;
We will be working on making this possible through UI, but for now please run the above commands
h
will do, thanks @nitya-signoz