in our environment, we have noticed that the pod gradually increases in memory usage and once it reaches the resource limit it is OOMKilled.
We have tried multiple options with the batch size:
• 50k batch size, 30s timeout
• 20k batch size, 15s timeout
• 50k batch size, 10s timeout
But nothing seems to be working, one way was to reduce the number of scrape job configurations in the prometheus receiver but that was temporary and it would still increase in memory utilisation.
SigNoz is an open-source APM. It helps developers monitor their applications & troubleshoot problems, an open-source alternative to DataDog, NewRelic, etc.