Hi. We are running version 2.4.0 in our infrastruc...
# ask-questions
w
Hi. We are running version 2.4.0 in our infrastructure. We tried to upgrade to 2.9.0 and 3.0.0 in the past, but we rolled back due to performance issues. We were told version 3.1.0 should be free of these issues. So, we tried to upgrade yesterday. Twice, we experienced Growthbook shutting down. Both times followed the exact same scenario. It uses much more CPU resources than version 2.4.0 (which might be expected). After a short time, the CPU utilization drops. At the same time, it stops responding. Then, memory utilization builds up, and eventually, it shuts down. So, we are back (again) at 2.4.0. We are on the enterprise plan. Is it possible to get some support to find the cause of the issue? The only (I guess) non-standard thing is that we let Growthbook log everything. We then parse the logs into metrics in Datadog. Might that be the cause? Thank you for your help.
f
Hi Jakub - can you tell me about your infrastructure?
are you on a cloud?
w
I can send you our helm chart. We have our own cluster.
f
can you DM me the name of your company too?
1
w
Hi Jakub. If done right there should be no reason you couldn't log and send metrics to datadog. We do. OpenTelemetry is built into Growthbook and is compatible with Datadog. When you say you are letting Growthbook log everything - what exactly are you doing? I can think of two reasons the logging might be an issue and could cause stats like you are seeing: 1. You are writing the logs to the same machine growthbook is running instead of forwarding it to another service. The machine you are using is running out of disk space, and writing to the disk slows down. Processes build up waiting to write to the disk. The OS continually swaps between the processes seeing who can write to the disk next until the machine dies. 2. There is some bug in Growthbook that is unique to your situation, or something specially that you are doing that breaks logging. The logging itself dies throwing a log message, which then dies and throws a log message, etc... Every new request that comes in sets off this chain, the end result is that the server is doing nothing but writing logs. Given that these are possible plausible scenarios, it makes sense to look into whether disk space is an issue, and then try running Growthbook without logging to see if it things perform better. It is clear to me though from the graph you sent that this is not a performance issue with Growthbook in the sense that a specific route can't handle the amount of experiments/metrics you have, but rather that there is some bug in Growthbook or in your setup such that there is some feedback loop happening that causes the server to go off the rails.
👀 1
Let me know how it goes and if you are still seeing the servers crash I can help to dig deeper.