Is there a well-defined spec for what format logs ...
# support
l
Is there a well-defined spec for what format logs should be in to be collected by SigNoz? It seems SigNoz expects the JSON to contain specific keys.
l
We are using structlog in Python. Do you find people typically use operators to convert, or just modify their loggers directly to match the format?
n
People mostly use operators to convert, but we are seeing new users who are directly using the SDK(mostly java) to send logs directly. Since you are using python you can try the otel sdk for python, though support for logs is experimental as of now. https://github.com/open-telemetry/opentelemetry-python/tree/main/docs/examples/logs
Also parsing becomes easier if you are logging in json or key value format.
l
Yes, we use JSON
Sorry, one more silly question. Should the keys of the JSON be things like
span_id
or
SpanId
? The opentelemetry docs suggest the latter, But, some signoz docs (like this one https://signoz.io/docs/userguide/fluentd_to_signoz/#steps-to-recieve-logs-from-fluentd) seem to use
span_id
.
https://opentelemetry.io/docs/reference/specification/protocol/file-exporter/#examples These examples seem to use like
severityText
rather than
SeverityText
as well. Thats 3 potential variants…
n
It doesn’t matter, you will have to use the traceParser regardless, here is how you do it. https://github.com/SigNoz/logs-benchmark/blob/0b2451e6108d8fa5fdd5808c4e174bd52b9d55d3/signoz/signoz-client/otel-collector-config.yaml#L22
t
Hey @nitya-signoz, I'm a coworker of Luke's. I wanted to additionally mention that we have deployed Signoz via kubernetes and we're automatically seeing all the pod logs. Which receiver are these logs ingested by? OTLP? I noticed the OTLP receiver doesn't support operators. https://signoz.io/docs/userguide/logs/#operators-for-parsing-and-manipulating-logs
The receivers FluentForward and OTLP doesn’t have operators. But for parsing them we can use logprocessor. i would have expected this to work:
Copy code
processors:
      logstransform:
        operators:
          - type: json_parser
            id: my_new_body
            parse_from: attributes.body
however, after restarting the collector, I'm still not seeing "my_new_body" as a field. any ideas? I confirmed by checking the losgs that the processor is enabled:
Copy code
signoz-otel-collector 2023-03-16T21:26:55.811Z    info    pipelines/pipelines.go:90    Processor is starting...    {"kind": "processor", "name": "logstransform", "pipeline": "logs"}                             │
│ signoz-otel-collector 2023-03-16T21:26:55.811Z    info    pipelines/pipelines.go:94    Processor started.    {"kind": "processor", "name": "logstransform", "pipeline": "logs"}
but i do see a failure, since not all logs contain a
body
or are valid json (lots of the pod logs are not).
Copy code
│ signoz-otel-collector 2023-03-16T21:29:03.909Z    error    helper/transformer.go:110    Failed to process entry    {"kind": "processor", "name": "logstransform", "pipeline": "logs", "operator_id": "my_new_body ││ ", "operator_type": "json_parser", "error": {"description": "Entry is missing the expected parse_from field.", "suggestion": "Ensure that all incoming entries contain the parse_from field." ...
a couple of quesetion: 1. is this the right way to go about this? should i be using operators on a receiver instead of using a processor? 2. if this error is preventing me from running logstransform on any logs, is there a way to filter which logs this runs on?
n
The k8s logs are collected by the filelog/k8s receiver.
If you see the json_parser configuration then you are parsing from
attributes.body
and the parsed attributes will be sent to
attributes
key only by default https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/docs/operators/json_parser.md . You can change it by changing the value of
parse_to
. You can also use the
if
the key to parse if only the
body
key is present in attributes. If you can help me with examples of what you are sending and what you are trying to extract, I can help.
t
ooh i see. i didn't have the
filelog/k8s
receiver configured in any way -- it just works by default i suppose? so here's an example log that i'm currently seeing in signoz. i don't see an
attributes
key.
Copy code
{
  "timestamp": 1679085202378150700,
  "id": "2N9hfxnx4K6pMEslQ4UBGZL0EWB",
  "trace_id": "",
  "span_id": "",
  "trace_flags": 0,
  "severity_text": "",
  "severity_number": 0,
  "body": "{\"body\": {\"http\": {\"method\": \"GET\", \"request_id\": \"5514ff9e43d94cbca171a6751ccae7ca\", \"version\": \"1.1\", \"user_agent\": \"kube-probe/1.24+\"}, \"network\": {\"client\": {\"ip\": \"10.0.3.226\", \"port\": 33064}}, \"duration\": 427268, \"request_id\": \"5514ff9e43d94cbca171a6751ccae7ca\", \"logger\": \"api.access\", \"filename\": \"main.py\", \"func_name\": \"logging_middleware\", \"lineno\": 74, \"message\": \"10.0.3.226:33064 - \\\"GET /api/v1/healthz HTTP/1.1\\\" 200\"}, \"severityText\": \"info\", \"timestamp\": \"2023-03-17T20:33:22.377798Z\", \"traceId\": \"5514ff9e43d94cbca171a6751ccae7ca\"}",
  "resources_string": {
    "host_name": "<hostname>",
    "k8s_cluster_name": "",
    "k8s_container_name": "mlcore-web",
    "k8s_container_restart_count": "0",
    "k8s_namespace_name": "mlcore",
    "k8s_node_name": "<nodename>",
    "k8s_pod_ip": "<k8s_pod_ip>",
    "k8s_pod_name": "mlcore-web-6876b7c7b9-2cxxx",
    "k8s_pod_start_time": "2023-03-17 13:55:03 +0000 UTC",
    "k8s_pod_uid": "caad5d5e-7a16-471d-8a5f-0459b5aa90c4",
    "os_type": "linux",
    "signoz_component": "otel-agent"
  },
  "attributes_string": {
    "log_file_path": "/var/log/pods/mlcore_mlcore-web-6876b7c7b9-2cxxx_7144c554-5d97-4774-ae17-6c39ef19a518/mlcore-web/0.log",
    "log_iostream": "stderr",
    "logtag": "F",
    "time": "2023-03-17T20:33:22.378150623Z"
  },
  "attributes_int": {},
  "attributes_float": {}
}
and here's my relevant otel-collector-config:
Copy code
receivers:
      filelog/k8s:
        include:
          - /var/log/pods/*/*/*.log
        exclude:
          - /var/log/pods/kube-system_*/*/*.log
        operators:
          - type: json_parser
            id: body_parser
            parse_from: attributes.body
            parse_to: attributes.parsed_body
i also have the filelog/k8s set in the pipelines.logs.receivers:
Copy code
pipelines:
        logs:
          receivers: [otlp, filelog/k8s]
it seems my json_parser is not working at all. i've tried adding any combination of
attributes.body
or just
body
or
body.body
and with/without
parse_to
, but i can't seem to see any difference.
i even tried something as simple as:
Copy code
receivers:
      filelog/k8s:
        include:
          - /var/log/pods/*/*/*.log
        exclude:
          - /var/log/pods/kube-system_*/*/*.log
        operators:
          - type: add
            field: travis_key
            value: travis_val
but that causes otel-collector to fail starting up with an error:
Copy code
Error: failed to get config: cannot unmarshal the configuration: 1 error(s) decoding:* error decoding 'receivers': error reading receivers configuration for "filelog/k8s": 1 error(s) decoding:* error decoding 'operators[0]': unmarshal to add: 1 error(s) decoding:* error decoding 'field': unrecognized prefix
2023/03/17 21:17:03 application run finished with error: failed to get config: cannot unmarshal the configuration: 1 error(s) decoding:* error decoding 'receivers': error reading receivers configuration for "filelog/k8s": 1 error(s) decoding:* error decoding 'operators[0]': unmarshal to add: 1 error(s) decoding:* error decoding 'field': unrecognized prefix
as an update, i realized this is in the
otel-agent-config.yaml
, not the
otel-collector-config.yaml
. using operators there does seem to be working!
however, i'm still seeing some weirdness when using the json_parser. i want to parse whatever arbitrary json my log might contain. i want to assume that we don't know all the keys ahead of time in signoz. is that possible? otherwise, every time we add a field to our logs, we need to come configure the json parser to explicitly extract that field. this feels wrong.
i guess, maybe to be more clear... i expected the json_parser to leave me with a json field. it does seem like it's parsing the field, but i can't actually use those nested values unless i
move
them? here's my
body
after it's hit by the
json_parser
Copy code
"body": "{\"filename\":\"main.py\",\"func_name\":\"logging_middleware\",\"http\":{\"method\":\"GET\",\"request_id\":\"TfHgVf2bYLlyDRSQT6YD8\",\"status_code\":200,\"url\":\"<http://api.dev.nsinfra.dev/api/v1/accounts/iyvnjbnodqsfcfiwegflr/projects/3720/tasks/e2b98ed9-ba95-41bf-be6a-216df7ab57c9>\",\"user_agent\":\"node-fetch\",\"version\":\"1.1\"},\"lineno\":74,\"logger\":\"api.access\",\"message\":\"10.0.2.174:40388 - \\\"GET /api/v1/accounts/iyvnjbnodqsfcfiwegflr/projects/3720/tasks/e2b98ed9-ba95-41bf-be6a-216df7ab57c9 HTTP/1.1\\\" 200\",\"network\":{\"client\":{\"ip\":\"10.0.2.174\",\"port\":40388}},\"request_id\":\"TfHgVf2bYLlyDRSQT6YD8\"}",
i can successfully do something like:
Copy code
- from: attributes.body.duration
          to: attributes.duration
          type: move
but i don't know all the keys that the body might contain, i just really want to be able to ad-hoc build queries that reference
body.duration GTE <some_value>