I m doing some disaster tests as mentioned in a previous thr GrowthBook Users #ask-questions

I'm doing some disaster tests as mentioned in a pr...

adorable-engineer-38532

11/29/2024, 10:42 AM

I'm doing some disaster tests as mentioned in a previous thread. If I kill the redis instances, the proxy healthcheck will still report that everything is fine. Is there a way to make the proxy healthcheck fail if the redis instances are down? Are there perhaps another endpoint I should check? Provided screenshot for reference. ping @happy-autumn-40938

happy-autumn-40938

12/02/2024, 5:22 PM

Good idea. I've introduced dependency statuses on the healthcheck endpoint, available in the latest proxy release. (PR) I don't make the check fail since technically the proxy is still up and can be configured to continue serving an SDK payload via its second layer in-memory cache. But you can look at

checks.cache:redis

which will return

"ready"

if up.

adorable-engineer-38532

12/03/2024, 9:02 AM

Nice, I will look into testing this as soon as possible.

adorable-engineer-38532

12/03/2024, 1:19 PM

Seems to work just fine, thanks for this addition.

happy-autumn-40938

12/03/2024, 4:58 PM

No problem, glad it’s working for you

adorable-engineer-38532

12/18/2024, 7:27 AM

Hello again @happy-autumn-40938 When you released this version I can admit I only did a functional test, didn't review the changelog in detail. I have some ideas I would like to share with you, I think they can be useful from a hosting perspective. Right now you are doing async calls in the healthcheck endpoint, I would rather see that you have a background worker or similar that retrieves the data periodically and store it. When calling the healthcheck return the stored data directly, non blocking. The issue arises when you lose connection to the API and the proxy is geolocated in a really remote part of the world. The pod can the be restarted multiple times when hitting the timeout value, and it can be hard to set a reasonable timeout value depending on the location. I had pods restarting several hundred times due to this, and I really don't want to increase the timeout value any more 😀 I would also like to see a liveness endpoint (/healthcheck/live) that is non blocking, non async, that just directly returns if the application is alive or not. Regards Jonas

60 Views

Open in Slack

Previous Next