is there a way to use the log processor to replace sensitive SigNoz Community #support

is there a way to use the log processor to replace...

brandon

03/15/2024, 6:10 PM

is there a way to use the log processor to replace sensitive data? for example connection strings that include password in the clear? i was going to try to use a regex parser to give it a shot, but not sure if it'll work as i want. parsing from and to

body

brandon

03/15/2024, 6:24 PM

i guess with logs such as the following, i'm having a hard time figuring out how best to process them. the "Remove" pipeline option doesn't seem like a good fit.

Copy code

11:13:39 Unable to establish a connection to the database. It may be down. 

Connection string=Provider=OraOLEDB.Oracle;Data Source=DB01;USER ID=USER01;Password=Password01;

Error=ORA-12560: TNS:protocol adapter error

Stack trace=   at System.Data.OleDb.OleDbConnectionInternal..ctor(OleDbConnectionString constr, OleDbConnection connection)
   at System.Data.OleDb.OleDbConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningObject)
   at System.Data.ProviderBase.DbConnectionFactory.CreateNonPooledConnection(DbConnection owningConnection, DbConnectionPoolGroup poolGroup, DbConnectionOptions userOptions)
   at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at System.Data.ProviderBase.DbConnectionInternal.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
   at System.Data.OleDb.OleDbConnection.Open()
   at V02Net.CSIBasicWeb.getSessMgrConn() in C:\M5\M5-AppCode\App_Code\CSIBasicWeb.cs:line 1856

what type of regex is supported - can we do a regex replacement of the

Password

value in the connection string? i don't think i want to parse out the connection string to fields, as i'm guessing i'd have to parse out all the other chunks of the log entry as well, and on top of that, can we even parse out the attributes twice? ie, parse out the connection string, error, stack trace, and then from connection string, parse out the key/value pairs, then remove the

Password

nitya-signoz

03/16/2024, 2:46 AM

You can use this processor https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/transformprocessor/README.md in the examples check for

replace_all_matches

brandon

03/18/2024, 3:12 PM

thank you, i'll look into using the transform processor

brandon

03/18/2024, 3:25 PM

i assume i need to add a block for the

transform

processor in the

otel-collector-config.yaml

file?

nitya-signoz

03/18/2024, 3:30 PM

correct, and then you’' need to add it in the pipeline for logs before batch processor

brandon

03/18/2024, 3:43 PM

you're referring to under the

pipelines

block under

services

in the collector config? so like under

logs:

processors: [transform, batch]

nitya-signoz

03/18/2024, 3:44 PM

correct

brandon

03/18/2024, 3:50 PM

cool, thank you

brandon

03/18/2024, 3:51 PM

quick question - when it comes to this sort of preprocessing, manipulation etc using the cloud offering. where is similar configuration done?

brandon

03/18/2024, 3:54 PM

and will there be the ability to add/edit/config such things thru the pipelines tab in the web console in the future?

nitya-signoz

03/18/2024, 3:56 PM

it’s similar to this, but the manipulation feature is available from UI for both cloud and free license. But as of now only these are supported https://signoz.io/docs/logs-pipelines/introduction/ We do have in roadmap about how we can bring more things from config to the UI but those are not prioritised yet.

brandon

03/18/2024, 3:56 PM

ok thank you

nitya-signoz

03/18/2024, 3:57 PM

let me know if you have any more questions or need help 🙂

brandon

03/18/2024, 3:57 PM

will do, thank you!

brandon

03/18/2024, 4:02 PM

sorry one last question - if you add such processors, parsers, etc. whether adding multiline support for previously ingested logs, or adding this processor to remove secrets, is there a way to reprocess old logs, or do you have to purge the old data and re-ingest the log data?

brandon

03/18/2024, 4:03 PM

working on the multiline stuff, i only found old slack threads archived that indicated i had to remove the signoz data altogether and rebuild. which is fine for this stage of the POC, but not ok later on as we grow it

nitya-signoz

03/18/2024, 4:04 PM

since this is done before data is written to DB, the changes will be applied to the new data only and the old data will remain the same.

brandon

03/18/2024, 4:04 PM

ok. and there's no way to reprocess the existing data? reapply new pipelines to the data?

nitya-signoz

03/18/2024, 4:07 PM

You can do it but it will require some work where you will have to re ingest the data or run some update sql queries.

brandon

03/18/2024, 4:09 PM

gotcha. do you know if there is a feature request to add this functionality to the GUI or even a script somewhere that can execute the sql queries? sounds like the original data is stored in the DB, as well as the transformed data, so i'd imagine this would be a desirable capability for users

👍 1

nitya-signoz

03/18/2024, 4:10 PM

As of now don’t think there is a feature request like this, but you can go ahead and create a GitHub issue.

brandon

03/18/2024, 4:11 PM

ok cool

brandon

03/18/2024, 4:24 PM

i'm guessing that in order to not crater realtime data processing, we'd want this feature to be able to specify a defined log set? rather than reprocessing all logs from all sources. and i'm guessing there's no issue around identifying defined logs by configured name (

filelog/appname

) huh?

nitya-signoz

03/18/2024, 4:28 PM

so in filelog recevier you can add some attributes/resource attribute and then in the processor add a

where

clause https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/ottl#check-if-an-attribute-exists eg:-

Copy code

where attributes["recevier_name"] == "appname"

nitya-signoz

03/18/2024, 4:30 PM

Copy code

filelog:
    include: [ /var/log/myservice/*.json ]
    attributes:
      receiver_name: app1

brandon

03/18/2024, 4:34 PM

interesting. ok good to know. so int he above snippet, you wouldn't specify the receiver as

filelog/app1

, but instead, as:

Copy code

filelog:
  ...
  attributes:
    receiver_name: app1

nitya-signoz

03/18/2024, 5:42 PM

sorry, didn’t mean that. this is what I meant

Copy code

filelog/app1:
    include: [ /var/log/myservice/*.json ]
    attributes:
      receiver_name: app1

brandon

03/19/2024, 3:16 PM

ok thank you

brandon

03/19/2024, 3:40 PM

re re-processing ingested logs - is there any documentation out there describing how to: 1. manually execute sql queries to do so 2. best practice for re-ingesting the data?

brandon

03/21/2024, 7:49 PM

does anyone have any thoughts on this? how to reprocess ingested data?

nitya-signoz

03/22/2024, 8:11 AM

1. manually execute sql queries to do so it will depend on the structure of your data. as of now there are no best practises, but lets say if you have the file from where logs were read then you can just read the data from file again. Otherwise you will have to get the data from clickhouse and send it to otel collector, but again this will depend on how you have previously processed your logs.

brandon

03/28/2024, 3:17 PM

how does one read the data from file again?

brandon

03/28/2024, 4:20 PM

i mean, how does one instruct that previously-read log files be re-read?

brandon

04/02/2024, 9:36 PM

any thoughts on this? how to instruct to re-ingest the logs?

nitya-signoz

04/03/2024, 5:04 AM

One way is to restart the otel collector and in the filelog receiver use

start_at: beginning

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/filelogreceiver/README.md but here the logs need to be present in the source file.

brandon

04/03/2024, 3:46 PM

ok thank you, since some of these logs were from past days, i had already configured with

start_at: beginning

, so that should be easy.

nitya-signoz

04/04/2024, 4:48 AM

yeah but start_at beginning can backfire a times because if the collector restarts it will send the entire data again

brandon

04/04/2024, 3:49 PM

what does signoz do with that data that's sent again? is the old data replaced w/ updated preprocessing, etc, or, does it just exist as duplicate log entries?

brandon

04/04/2024, 3:53 PM

we could really use a way to reprocess logs

nitya-signoz

04/04/2024, 4:02 PM

I want to understand your use case, it that very often you are going to change pipelines and you want older data to be processed as well. because in most cases most of the setup is done at the beginning and then gradually people add processing but it doesn’t change that drastically and very less hard requirement of processing old data.

brandon

04/09/2024, 9:15 PM

right now it's the POC we're doing, ensuring we have adequate log data to use. if we were to implement, it'd be so we could catch up whatever logs were available already written to disk when we "switched on"

brandon

04/09/2024, 9:16 PM

as for changing how the logs are processed, in the case of 3rd party apps we may not always have a full view of how every exception or error will be written to logs. so masking secrets, etc, might come after the fact.

44 Views

Open in Slack

Previous Next