Beyond that, what heuristics are people using to determine when to call experiments? The measures of confidence and risk are useful and intuitive, but I’m foreseeing a lot of experiments that end up in the “no effect” bucket, e.g. with 40-60% Chance to Beat Control a few weeks out. In these cases, is the right move to wait for more data, or should I call the experiment and trust that there’s probably no effect?
I know it’s probably hard to enumerate rules here, but I’m curious on people’s general approach!
06/11/2022, 7:57 PM
If a test is flat, chance to beat control will hover around 50% forever, but risk will approach zero the more data you collect. So for inconclusive tests, you can wait until the risk is low enough and then just make a decision one way or the other and move on. There are usually lots of external factors in making a decision so there's no hard and fast rule for this.
06/11/2022, 8:03 PM
I see, that’s useful to know that risk will lower over time in that case. I’m sure we’ll build a better intuition of these metrics over time.
One thing that would be incredibly useful to see, and that we would be happy to contribute to, is a set of post-mortems / examples of experiments people have run, how they’ve designed them, how they chose conversion delay / window, how they analyzed the results, etc. I felt like I had built up that intuition deeply in the frequentist world, but since I’m new to the Bayesian metrics, I’m having trouble reasoning through all the nuances. Are there resources you’d recommend for “real world Bayesian A/B testing” with stories like this from other companies?