Hi! I would like to use GrowthBook for analyze exp...
# announcements
r
Hi! I would like to use GrowthBook for analyze experiments that I'm currently setting up. Just a few questions: • The platform is very intuitive and easy to use. Thus, I would like to give the instrument to the business people to do self live experiment monitoring. What do you think about the peeking problem in this situation? • I read from your white paper that there is no Fixed Horizon using Bayesian approach, is it true also with uninformative priors? What about Robinson's post ? • What are the differences between the Bayesian approach with uninformative priors and the Frequentist approach? • Set a risk threshold is the only metric that you suggest for ending an experiment? • Have you planned the issue https://github.com/growthbook/growthbook/issues/1233? Thank you!!
b
Hi @rhythmic-napkin-63099 thanks for these great questions! I will connect with our data scientist once he comes online today and we'll get you some answers soon. 😄
r
Ok, @brief-honey-45610! Thank you!
b
The platform is very intuitive and easy to use. Thus, I would like to give the instrument to the business people to do self live experiment monitoring. What do you think about the peeking problem in this situation?
I read from your white paper that there is no Fixed Horizon using Bayesian approach, is it true also with uninformative priors? What about Robinson's post ?
Peeking is a concern in both of the frequentist and bayesian engines. We are working on a blog article about peeking, our position on it, and solutions to it. Peeking is a problem for both engines. While peeking is normally talked about with respect to frequentist testing---because frequentist testing has devices like p-values that claim to control false positives---using a bayesian engine to make shipping decisions as soon as the results demonstrate some threshold is surpassed also can cause more bad decisions than you might otherwise realize. "More bad decisions than you might otherwise realize" is just a softer way of saying you will have a higher false positive rate and applies to the bayesian engine even though bayesian stats don't claim to give you control over the false positive rate. To this end, we strongly agree with the position taken in David Robinson's article linked above. So what are your solutions? • Implement a culture that respects the weight of evidence before making decisions (but this can be hard to do at scale, and at some point, someone needs to make a decision) • Use GrowthBook's "minimum sample size" setting to at least ensure X number of conversions are reached before we return statistics. • Use Sequential Testing in the Frequentist engine, which will hurt your power, but returns the guarantee that no matter how often you peek, you will only get a false positive on 5% of tests (if your p-value threshold is 0.05).
What are the differences between the Bayesian approach with uninformative priors and the Frequentist approach?
There are lots of differences, so this can be hard to answer, but the asker may be (correctly) hinting at the fact that in practice the Bayesian engine with uninformative priors tend to produce similar results with the Frequentist engine. That said, we have some additional tools in the Frequentist engine (CUPED, sequential testing) that we don't have in the Bayesian engine, but the Bayesian engine returns more intuitive quantities like "Chance to win" rather than more opaque values like p-values.
Set a risk threshold is the only metric that you suggest for ending an experiment?
Risk is only used in the bayesian engine. It isn't a metric we suggest you use, but it can be used in conjunction with your chance to win data. For example, if one metric has a 99% chance to win (and low risk) and another metric only has a 50% chance to win, so you aren't sure which is better, but that second metric has low risk, then maybe you don't need to wait for more data because even if that second metric is actually a little negative, the amount it's negative is low and not worth waiting extra weeks of experimentation time to figure out.
Have you planned the issue https://github.com/growthbook/growthbook/issues/1233?
We have taken steps towards it, but it is not currently on our near term roadmap.
@rhythmic-napkin-63099 I was able to connect with our Data Scientist and these are the notes he asked me to pass along. Please let us know if you have any other questions!
r
@brief-honey-45610 thank you very much for the prompt reply to all my points!! Just one more question regarding the first point: • I would like to use the Bayesian Engine due to its more intuitive metrics to take decisions. But, the completely uninformative prior does not prevent or reduce the peeking problem and does not reduce the amount of data needed to have a reliable results so my question is how can I decide when it's time to stop an experiment and the results are reliable without peeking? Thank you!
b
Hi Nicola, absolutely! We're always here to help. I'll ask our Data Scientist to consider your most recent question and I'll get back to you in a few hours 🙂
🎉 1
Hi again, Nicola, I just chatted with our Data Scientist about this There isn't an easy answer to your question 🙃 In a frequentist engine, you generally set a "Minimum detectable effect" which is the smallest effect you want to be able to detect before making a decision. In the bayesian engine, there are some analogous notions, but it isn't as clearly defined. That said, you can still roughly estimate how long you should run an experiment to detect an effect of size X with the amount of traffic Y. Eventually, we will build this functionality directly in to GrowthBook. For now, we recommend you do the computation yourself or use an off the shelf calculator which should be approximately correct. Here is a Frequentist calculator, which should give a good enough approximation that you can also use with the Bayesian engine: https://www.evanmiller.org/ab-testing/sample-size.html
r
Ok! Thank you very much! I really appreciate your efforts!
b
You're welcome 😊