Hi Jakub. If done right there should be no reason you couldn't log and send metrics to datadog. We do.
OpenTelemetry is built into Growthbook and is compatible with Datadog. When you say you are letting Growthbook log everything - what exactly are you doing?
I can think of two reasons the logging might be an issue and could cause stats like you are seeing:
1. You are writing the logs to the same machine growthbook is running instead of forwarding it to another service. The machine you are using is running out of disk space, and writing to the disk slows down. Processes build up waiting to write to the disk. The OS continually swaps between the processes seeing who can write to the disk next until the machine dies.
2. There is some bug in Growthbook that is unique to your situation, or something specially that you are doing that breaks logging. The logging itself dies throwing a log message, which then dies and throws a log message, etc... Every new request that comes in sets off this chain, the end result is that the server is doing nothing but writing logs.
Given that these are possible plausible scenarios, it makes sense to look into whether disk space is an issue, and then try running Growthbook without logging to see if it things perform better.
It is clear to me though from the graph you sent that this is not a performance issue with Growthbook in the sense that a specific route can't handle the amount of experiments/metrics you have, but rather that there is some bug in Growthbook or in your setup such that there is some feedback loop happening that causes the server to go off the rails.