is there a way to use the log processor to replace...
# support
b
is there a way to use the log processor to replace sensitive data? for example connection strings that include password in the clear? i was going to try to use a regex parser to give it a shot, but not sure if it'll work as i want. parsing from and to
body
?
i guess with logs such as the following, i'm having a hard time figuring out how best to process them. the "Remove" pipeline option doesn't seem like a good fit.
Copy code
11:13:39 Unable to establish a connection to the database. It may be down. 

Connection string=Provider=OraOLEDB.Oracle;Data Source=DB01;USER ID=USER01;Password=Password01;

Error=ORA-12560: TNS:protocol adapter error

Stack trace=   at System.Data.OleDb.OleDbConnectionInternal..ctor(OleDbConnectionString constr, OleDbConnection connection)
   at System.Data.OleDb.OleDbConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningObject)
   at System.Data.ProviderBase.DbConnectionFactory.CreateNonPooledConnection(DbConnection owningConnection, DbConnectionPoolGroup poolGroup, DbConnectionOptions userOptions)
   at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at System.Data.ProviderBase.DbConnectionInternal.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
   at System.Data.OleDb.OleDbConnection.Open()
   at V02Net.CSIBasicWeb.getSessMgrConn() in C:\M5\M5-AppCode\App_Code\CSIBasicWeb.cs:line 1856
what type of regex is supported - can we do a regex replacement of the
Password
value in the connection string? i don't think i want to parse out the connection string to fields, as i'm guessing i'd have to parse out all the other chunks of the log entry as well, and on top of that, can we even parse out the attributes twice? ie, parse out the connection string, error, stack trace, and then from connection string, parse out the key/value pairs, then remove the
Password
?
n
b
thank you, i'll look into using the transform processor
i assume i need to add a block for the
transform
processor in the
otel-collector-config.yaml
file?
n
correct, and then you’' need to add it in the pipeline for logs before batch processor
b
you're referring to under the
pipelines
block under
services
in the collector config? so like under
logs:
,
processors: [transform, batch]
?
n
correct
b
cool, thank you
quick question - when it comes to this sort of preprocessing, manipulation etc using the cloud offering. where is similar configuration done?
and will there be the ability to add/edit/config such things thru the pipelines tab in the web console in the future?
n
it’s similar to this, but the manipulation feature is available from UI for both cloud and free license. But as of now only these are supported https://signoz.io/docs/logs-pipelines/introduction/ We do have in roadmap about how we can bring more things from config to the UI but those are not prioritised yet.
b
ok thank you
n
let me know if you have any more questions or need help 🙂
b
will do, thank you!
sorry one last question - if you add such processors, parsers, etc. whether adding multiline support for previously ingested logs, or adding this processor to remove secrets, is there a way to reprocess old logs, or do you have to purge the old data and re-ingest the log data?
working on the multiline stuff, i only found old slack threads archived that indicated i had to remove the signoz data altogether and rebuild. which is fine for this stage of the POC, but not ok later on as we grow it
n
since this is done before data is written to DB, the changes will be applied to the new data only and the old data will remain the same.
b
ok. and there's no way to reprocess the existing data? reapply new pipelines to the data?
n
You can do it but it will require some work where you will have to re ingest the data or run some update sql queries.
b
gotcha. do you know if there is a feature request to add this functionality to the GUI or even a script somewhere that can execute the sql queries? sounds like the original data is stored in the DB, as well as the transformed data, so i'd imagine this would be a desirable capability for users
👍 1
n
As of now don’t think there is a feature request like this, but you can go ahead and create a GitHub issue.
b
ok cool
i'm guessing that in order to not crater realtime data processing, we'd want this feature to be able to specify a defined log set? rather than reprocessing all logs from all sources. and i'm guessing there's no issue around identifying defined logs by configured name (
filelog/appname
) huh?
n
so in filelog recevier you can add some attributes/resource attribute and then in the processor add a
where
clause https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/ottl#check-if-an-attribute-exists eg:-
Copy code
where attributes["recevier_name"] == "appname"
Copy code
filelog:
    include: [ /var/log/myservice/*.json ]
    attributes:
      receiver_name: app1
b
interesting. ok good to know. so int he above snippet, you wouldn't specify the receiver as
filelog/app1
, but instead, as:
Copy code
filelog:
  ...
  attributes:
    receiver_name: app1
n
sorry, didn’t mean that. this is what I meant
Copy code
filelog/app1:
    include: [ /var/log/myservice/*.json ]
    attributes:
      receiver_name: app1
b
ok thank you
re re-processing ingested logs - is there any documentation out there describing how to: 1. manually execute sql queries to do so 2. best practice for re-ingesting the data?
does anyone have any thoughts on this? how to reprocess ingested data?
n
1. manually execute sql queries to do so it will depend on the structure of your data. as of now there are no best practises, but lets say if you have the file from where logs were read then you can just read the data from file again. Otherwise you will have to get the data from clickhouse and send it to otel collector, but again this will depend on how you have previously processed your logs.
b
how does one read the data from file again?
i mean, how does one instruct that previously-read log files be re-read?
any thoughts on this? how to instruct to re-ingest the logs?
n
One way is to restart the otel collector and in the filelog receiver use
start_at: beginning
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/filelogreceiver/README.md but here the logs need to be present in the source file.
b
ok thank you, since some of these logs were from past days, i had already configured with
start_at: beginning
, so that should be easy.
n
yeah but start_at beginning can backfire a times because if the collector restarts it will send the entire data again
b
what does signoz do with that data that's sent again? is the old data replaced w/ updated preprocessing, etc, or, does it just exist as duplicate log entries?
we could really use a way to reprocess logs
n
I want to understand your use case, it that very often you are going to change pipelines and you want older data to be processed as well. because in most cases most of the setup is done at the beginning and then gradually people add processing but it doesn’t change that drastically and very less hard requirement of processing old data.
b
right now it's the POC we're doing, ensuring we have adequate log data to use. if we were to implement, it'd be so we could catch up whatever logs were available already written to disk when we "switched on"
as for changing how the logs are processed, in the case of 3rd party apps we may not always have a full view of how every exception or error will be written to logs. so masking secrets, etc, might come after the fact.