Hi Team, Is there any OTEL expert here? I want to...
# support
s
Hi Team, Is there any OTEL expert here? I want to parse below log entry and extract its fields like method, code etc. I am trying hard from a long time but I am not getting anything.
Copy code
2023-04-23T12:09:52.345193051Z stdout F 10.110.245.37 - - [23/Apr/2023:12:09:52 +0000] "GET / HTTP/1.1" 200 615 "-" "Wget" "-"
Below is my otel-agent Configmap
Copy code
receivers:
      filelogs:
        include: [/var/log/pods/*_php-nginx*_*/*/*.log] 
        start_at: beginning
        operators:
          - type: regex_parser
            regex: '^(?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z)\s+(?P<stream>stdout)\s+(?P<severity>[A-Z])\s+(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+-\s+-\s+\[(?P<datetime>\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2}\s+\+\d{4})\]\s+"(?P<method>[A-Z]+)\s+(?P<path>[^ ]+)\s+(?P<protocol>HTTP\/\d\.\d)"\s+(?P<status>\d+)\s+(?P<size>\d+)\s+"(?P<referer>[^"]+)"\s+"(?P<user_agent>[^"]+)"\s+"(?P<extra>[^"]+)"$'
            output: extract_metadata_filepath
            timestamp:
              layout: '%Y-%m-%dT%H:%M:%S.%LZ'
              parse_from: attributes.time

          - type: regex_parser
            id: extract_metadata_filepath
            regex: '(?P<host>[^ ]+)'
          - from: attributes.host
            to: resource["my.hosts"]
            type: move
a
@nitya-signoz should be able to help here
n
The regex you have written is correct, Are your logs getting ingested when you remove the operators ?
s
@nitya-signoz I have tried multiple regex. My logs are ingesting without operator. I am making regex using following website: https://regex101.com/?flavor=golang Please check below image. It is successfully parsed. But in signoz frontend, I am not getting any field. Like in image ip column is empty:
@nitya-signoz Do you have something on your mind that I can test. I am stuck for more than a week and struggling very hard. How can I display attributes/fields on signoz frontend?
n
Can you send me some more example logs. around 10-15 of them should be fine.
s
@nitya-signoz Please find below: tail -f /var/log/pods/*_php-apache_/*/*.log
Copy code
2023-04-25T05:24:06.222112278Z stdout F 10.110.178.97 - - [25/Apr/2023:05:24:06 +0000] "GET / HTTP/1.1" 200 615 "-" "Wget" "-"
2023-04-25T05:24:07.226824031Z stdout F 10.110.178.97 - - [25/Apr/2023:05:24:07 +0000] "GET / HTTP/1.1" 200 615 "-" "Wget" "-"
2023-04-25T05:24:08.231429132Z stdout F 10.110.178.97 - - [25/Apr/2023:05:24:08 +0000] "GET / HTTP/1.1" 200 615 "-" "Wget" "-"
2023-04-25T05:24:09.236194447Z stdout F 10.110.178.97 - - [25/Apr/2023:05:24:09 +0000] "GET / HTTP/1.1" 200 615 "-" "Wget" "-"
2023-04-25T05:24:10.241420179Z stdout F 10.110.178.97 - - [25/Apr/2023:05:24:10 +0000] "GET / HTTP/1.1" 200 615 "-" "Wget" "-"
2023-04-25T05:24:11.246190982Z stdout F 10.110.178.97 - - [25/Apr/2023:05:24:11 +0000] "GET / HTTP/1.1" 200 615 "-" "Wget" "-"
2023-04-25T05:24:12.250794012Z stdout F 10.110.178.97 - - [25/Apr/2023:05:24:12 +0000] "GET / HTTP/1.1" 200 615 "-" "Wget" "-"
2023-04-25T05:24:13.2557379Z stdout F 10.110.178.97 - - [25/Apr/2023:05:24:13 +0000] "GET / HTTP/1.1" 200 615 "-" "Wget" "-"
2023-04-25T05:24:14.260639057Z stdout F 10.110.178.97 - - [25/Apr/2023:05:24:14 +0000] "GET / HTTP/1.1" 200 615 "-" "Wget" "-"
2023-04-25T05:24:15.265771256Z stdout F 10.110.178.97 - - [25/Apr/2023:05:24:15 +0000] "GET / HTTP/1.1" 200 615 "-" "Wget" "-"
2023-04-25T05:24:16.270501506Z stdout F 10.110.178.97 - - [25/Apr/2023:05:24:16 +0000] "GET / HTTP/1.1" 200 615 "-" "Wget" "-"
@nitya-signoz I have generated an error to see otel agent logs by doing below: filelog: include: [/var/log/pods/*_php-apache*_*/*/*.log] start_at: beginning operators: - type: regex_parser regex: '^(?P<time>[^ ]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) (?P<host>[^ ]*)' timestamp: layout: '%Y-%m-%dT%H:%M:%S.%LZ' parse_from: attributes.times Changed from attributes.time to attribute.times so that I can see logs in details. By doing this I can see logs are parsed and attributes are generated.
Copy code
"details": {"parse_from": "attributes.times"}}, "action": "send", "entry": {"observed_timestamp":"2023-04-25T05:27:53.480511285Z","timestamp":"0001-01-01T00:00:00Z","body":"2023-04-25T05:27:53.373058091Z stdout F 10.110.178.97 - - [25/Apr/2023:05:27:53 +0000] \"GET / HTTP/1.1\" 200 615 \"-\" \"Wget\" \"-\"","attributes":{"host":"10.110.178.97","log.file.name":"0.log","logtag":"F","stream":"stdout","time":"2023-04-25T05:27:53.373058091Z"},"severity":0,"scope_name":""}}
But how will these attributes be captured in signoz frontend?
@nitya-signoz Do I need to add anything else in processor or anywhere in the configuration?
n
Hey thanks for sharing the logs. by default timestamp is added by the otel collector based on when the log was received the
timestamp
parser helps to replace that with the correct timestamp the log was generated. So you are saying that it is working once you changed/removed the timestamp parser ?
s
Thanks @nitya-signoz. No it does not work, it is ingesting logs but still does not show any attribute on the frontend. Every thing goes to body. I want to extract method, agent etc
n
So you are saying that on the left side in SigNoz UI, the interesting fields are empty?
s
yes exactly, these are empty. By adding below configuration I have the key but these are empty: receivers: filelog: #include: [ /var/log/pods/*_php-apache*_*/*/*.log, /var/log/containers/php-apache-..log, /var/log/containers/php-apache-.log, /var/log/containers/php-apache-858b65965b-7rlm2_nginx-test_php-apache-00d308bf011297a1877e1aa35b7c5e0f33e802cd25ba5756741f79befcc01f62.log ] include: [/var/log/pods/*_php-apache*_*/*/*.log] start_at: beginning operators: - type: regex_parser regex: '^(?P<time>[^ ]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) (?P<host>[^ ]*)' output: extract_metadata_filepath timestamp: layout: '%Y-%m-%dT%H:%M:%S.%LZ' parse_from: attributes.time - type: regex_parser id: extract_metadata_filepath regex: '^(?P<host>[^ ]*)' parse_from: body - type: move from: attributes.host to: resource["custom.my_field"]
@nitya-signoz See this image
n
okay give me some time.
s
ok thanks @nitya-signoz Really appreciate your help
n
This config seems to be working, I have just corrected the value of parse_form
Copy code
operators:
      - type: regex_parser
        regex: '^(?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z)\s+(?P<stream>stdout)\s+(?P<severity>[A-Z])\s+(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+-\s+-\s+\[(?P<datetime>\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2}\s+\+\d{4})\]\s+"(?P<method>[A-Z]+)\s+(?P<path>[^ ]+)\s+(?P<protocol>HTTP\/\d\.\d)"\s+(?P<status>\d+)\s+(?P<size>\d+)\s+"(?P<referer>[^"]+)"\s+"(?P<user_agent>[^"]+)"\s+"(?P<extra>[^"]+)"$'
        output: extract_metadata_filepath
        timestamp:
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'
          parse_from: attributes.timestamp
      - type: regex_parser
        id: extract_metadata_filepath
        regex: '(?P<host>[^ ]+)'
      - from: attributes.host
        to: resource["my.hosts"]
        type: move
Can you try this out, else we can schedule a call.
s
ok let me check @nitya-signoz
@nitya-signoz Still it is empty: Can you please check below image: Did you change somewhere else in processor or other configuration? When are you available for call? Literally I am scratching my head more than a week now.
I really really appreciate your help @nitya-signoz. I will be obliged if you can spare sometime for a short call?
n
Ahh the value of my_host is empty because the regex for attributes.host
(?P<host>[^ ]+)
is not correct. Eg:- if the log line is
Copy code
2023-04-25T05:24:06.222112278Z stdout F 10.110.178.97 - - [25/Apr/2023:05:24:06 +0000] "GET / HTTP/1.1" 200 615 "-" "Wget" "-"
and you want my_host as
10.110.178.97
? For other fields that are extracted, on the left-hand side there are selected and interesting fields. If you convert interesting to selected fields it will appear in the main page. Else when you click on the expand button in the beginning of the log line you will be able to see the parsed values.
If that is the case^ then this is your correct config
Copy code
operators:
      - type: regex_parser
        regex: '^(?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z)\s+(?P<stream>stdout)\s+(?P<severity>[A-Z])\s+(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+-\s+-\s+\[(?P<datetime>\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2}\s+\+\d{4})\]\s+"(?P<method>[A-Z]+)\s+(?P<path>[^ ]+)\s+(?P<protocol>HTTP\/\d\.\d)"\s+(?P<status>\d+)\s+(?P<size>\d+)\s+"(?P<referer>[^"]+)"\s+"(?P<user_agent>[^"]+)"\s+"(?P<extra>[^"]+)"$'
        output: move_host
        timestamp:
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'
          parse_from: attributes.timestamp
      - from: attributes.ip
        id: move_host
        to: resource["my.hosts"]
        type: move
if you can spare sometime for a short call?
Sure once we are clear on the above comments we can get on a call if required.
s
ok I am checking last comment. I did not understand what you are asking/suggesting here https://signoz-community.slack.com/archives/C01HWQ1R0BC/p1682414684130369?thread_ts=1682353463.657419&amp;cid=C01HWQ1R0BC
n
in this operator
Copy code
- type: regex_parser
            id: extract_metadata_filepath
            regex: '(?P<host>[^ ]+)'
what are you trying to extract is what I was trying to ask?
s
It is still empty. Are you available for the call?
n
Sure.
Copy code
processors:
      logstransform/parse_log:
         operators:
            - default: noop
              id: router_signoz
              routes:
                - expr: 'body matches ".*user-id=.*trace-id=.*span-id.*line"'
                  output: parse_regex
              type: router
            - id: parse_regex
              type: regex_parser
              parse_from: body
              regex: '.*INFO[ ]+(?P<filename>\S+)[ ]+user-id= (?P<user_id>\S+)[ ]+trace-id=.*span-id.*line:(?P<line>\d+)[ ]'
              parse_to: attributes
            - id: noop
              type: noop
s
@Ankit Nayan @nitya-signoz I really appreciate your time and effort. Much obliged. Below is the configurations that works for me:
Copy code
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: k8s-infra-otel-agent
  namespace: platform
  labels:
    <http://app.kubernetes.io/component|app.kubernetes.io/component>: otel-agent
data:
  otel-agent-config.yaml: |-
    exporters:
      otlp:
        endpoint: ${OTEL_EXPORTER_OTLP_ENDPOINT}
        headers:
          signoz-access-token: Bearer ${SIGNOZ_API_KEY}
        tls:
          insecure: ${OTEL_EXPORTER_OTLP_INSECURE}
          insecure_skip_verify: ${OTEL_EXPORTER_OTLP_INSECURE_SKIP_VERIFY}
    extensions:
      health_check:
        endpoint: 0.0.0.0:13133
      pprof:
        endpoint: localhost:1777
      zpages:
        endpoint: localhost:55679
    processors:
      attributes:
        actions:
          - key: host
            from_attribute: host
            action: insert
        # actions:
        #   - key: new_hosts
        #     pattern: '(?P<new_hosts>[^ ]+)'
        #     action: extract

      batch:
        send_batch_size: 10000
        timeout: 200ms
      k8sattributes:
        extract:
          metadata:
          - k8s.namespace.name
          - k8s.pod.name
          - k8s.pod.uid
          - k8s.pod.start_time
          - k8s.deployment.name
          - k8s.node.name

        filter:
          node_from_env_var: K8S_NODE_NAME
        passthrough: false
        pod_association:
        - sources:
          - from: resource_attribute
            name: k8s.pod.ip
        - sources:
          - from: resource_attribute
            name: k8s.pod.uid
        - sources:
          - from: connection
      resourcedetection:
        detectors:
        - env
        - system
        override: true
        system:
          hostname_sources:
          - dns
          - os
        timeout: 2s
    receivers:

      filelog:
        include: [/var/log/pods/*_php-apache*_*/*/*.log] 
        exclude: [/var/log/pods/platform_*/*/*.log]
        start_at: end
        include_file_name: false
        include_file_path: true
        operators:
          - type: regex_parser
            #regex: '^(?P<time>[^ ]+) (?P<stream>stdout)\s+(?P<severity>[A-Z])\s+(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+-\s+-\s+\[(?P<datetime>\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2}\s+\+\d{4})\]\s+"(?P<method>[A-Z]+)\s+(?P<path>[^ ]+)\s+(?P<protocol>HTTP\/\d\.\d)"\s+(?P<status>\d+)\s+(?P<size>\d+)\s+"(?P<referer>[^"]+)"\s+"(?P<user_agent>[^"]+)"\s+"(?P<extra>[^"]+)"$'
            regex: '^(?P<time>[^ ]+) (?P<stream>stdout)\s+(?P<severity>[A-Z])\s+(?P<client_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+-\s+-\s+\[(?P<datetime>\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2}\s+\+\d{4})\]\s+"(?P<method>[A-Z]+)\s+(?P<path>[^ ]+)\s+(?P<protocol>HTTP\/\d\.\d)"\s+(?P<status>\d+)\s+(?P<size>\d+)\s+"(?P<referer>[^"]+)"\s+"(?P<user_agent>[^"]+)"'
            output: extract_metadata_from_filepath_2   #move_host
            timestamp:
              layout: '%Y-%m-%dT%H:%M:%S.%LZ'
              parse_from: attributes.time
          - from: attributes.client_ip
            id: move_host
            to: resource["my.hosts"]
            type: move

          - id: extract_metadata_from_filepath_2
            parse_from: attributes["log.file.path"]
            regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
            type: regex_parser

          - from: attributes.container_name
            to: resource["k8s.container.name"]
            type: move
          - from: attributes.namespace
            to: resource["k8s.namespace.name"]
            type: move
          - from: attributes.pod_name
            to: resource["k8s.pod.name"]
            type: move
          - from: attributes.restart_count
            to: resource["k8s.container.restart_count"]
            type: move
          - from: attributes.uid
            to: resource["k8s.pod.uid"]
            type: move

      filelog/k8s:
        exclude:
        - /var/log/pods/kube-system_*/*/*.log
        - /var/log/pods/platform_*/*/*.log
        - /var/log/pods/kubecost_*/*/*.log
        - /var/log/pods/*_hotrod*_*/*/*.log
        - /var/log/pods/*_locust*_*/*/*.log
        - /var/log/pods/*_nginx-test*_*/*/*.log
        - /var/log/pods/*_php-apache*_*/*/*.log
       

        
        include:
        - /var/log/pods/*/*/*.log
        include_file_name: false
        include_file_path: true
        operators:
        - id: get-format
          routes:
          - expr: body matches "^\\{"
            output: parser-docker
          - expr: body matches "^[^ Z]+ "
            output: parser-crio
          - expr: body matches "^[^ Z]+Z"
            output: parser-containerd
          type: router
        - id: parser-crio
          output: extract_metadata_from_filepath
          regex: ^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
          timestamp:
            layout: "2006-01-02T15:04:05.000000000-07:00"
            layout_type: gotime
            parse_from: attributes.time
          type: regex_parser
        - id: parser-containerd
          output: extract_metadata_from_filepath
          regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
          timestamp:
            layout: '%Y-%m-%dT%H:%M:%S.%LZ'
            parse_from: attributes.time
          type: regex_parser
        - id: parser-docker
          output: extract_metadata_from_filepath
          timestamp:
            layout: '%Y-%m-%dT%H:%M:%S.%LZ'
            parse_from: attributes.time
          type: json_parser
        - id: extract_metadata_from_filepath
          parse_from: attributes["log.file.path"]
          regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
          type: regex_parser
        - from: attributes.stream
          to: attributes["log.iostream"]
          type: move
        - from: attributes.container_name
          to: resource["k8s.container.name"]
          type: move
        - from: attributes.namespace
          to: resource["k8s.namespace.name"]
          type: move
        - from: attributes.pod_name
          to: resource["k8s.pod.name"]
          type: move
        - from: attributes.restart_count
          to: resource["k8s.container.restart_count"]
          type: move
        - from: attributes.uid
          to: resource["k8s.pod.uid"]
          type: move
        - from: attributes.log
          to: body
          type: move
        start_at: beginning

      hostmetrics:
        collection_interval: 30s
        scrapers:
          cpu: {}
          disk: {}
          filesystem: {}
          load: {}
          memory: {}
          network: {}
      kubeletstats:
        auth_type: serviceAccount
        collection_interval: 30s
        endpoint: ${K8S_NODE_NAME}:10250
        extra_metadata_labels:
        - container.id
        - k8s.volume.type
        insecure_skip_verify: true
        metric_groups:
        - container
        - pod
        - node
        - volume
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
            max_recv_msg_size_mib: 4
          http:
            endpoint: 0.0.0.0:4318
    service:
      extensions:
      - health_check
      - zpages
      pipelines:
        logs:
          exporters:
          - otlp
          processors:
          - resourcedetection
          - k8sattributes
          - batch
          receivers:
          - otlp
          - filelog
          - filelog/k8s
        metrics:
          exporters:
          - otlp
          processors:
          - resourcedetection
          - k8sattributes
          - batch
          receivers:
          - otlp
        metrics/generic:
          exporters:
          - otlp
          processors:
          - resourcedetection
          - k8sattributes
          - batch
          receivers:
          - hostmetrics
          - kubeletstats
        traces:
          exporters:
          - otlp
          processors:
          - resourcedetection
          - k8sattributes
          - batch
          receivers:
          - otlp
      telemetry:
        metrics:
          address: 0.0.0.0:8888