Not strictly Growthbook-related question, but is r...
# announcements
s
Not strictly Growthbook-related question, but is required sample size calculation for Bayesian experiment the same as that for Frequentist-approach experiments? We user a third-party tool to calculate the required sample size since the native growthbook calculation is somewhat lacking, but we need to make sure that we doing it right
f
Frequentist sample size calculators are usually a good estimate for planning purposes. Bayesian doesn't have fixed sample sizes, so you may reach significance earlier or later than the calculator says, depending on how the test performs.
m
The approach we use is to estimate the test duration by looking at how much time is needed before the risk of choosing the winner group (set at a given uplift) is below a certain threshold. We normally set this to 0.25% risk to align with what GrowthBook is using. Generally the approach has given a good approximation for our use cases! We use an internal tool for Bayesian testing though - not aware of any online resources to do so.
s
We use standard sample size calculators for estimating experiment duration, but most of them use frequentist approach so we aren’t sure this estimation agrees with Bayesian A/B testing. This is the one we usually use https://www.evanmiller.org/ab-testing/sample-size.html
m
Those required sample sizes serve different purposes. In frequentist approach you start with presumable effect size, then choose statistical significance and statistical power levels, then compute sample size. An interpretation is following "for effect size equal or greater to your chosen value, you would pick 'best' group with probability related to your chosen statistical significance and power levels". (I might be wrong here, but I think it's something like that). So, if you want these guarantees on your probability to pick best group, you should run your experiment at least as long as your calculated sample size. Bayesian approach typically operates differently. It works with data from a single experiment. Given your current data, it estimates probability distributions of your metrics and related values. For example, probability that metric in one group is "greater" than in other P(conversion_b > conversion_a | data ). There could be different conditions to stop an experiment, but let's assume that experiment is stopped when this probability reaches certain level, say P(conv_b > conv_a | data ) >= 0.95. If your current probability is lower P(conv_b > conv_a | data ) = 0.7 then you can simulate how much additional data you need to reach the required level. If you do not have data at all, you can run these simulations using prior distributions. So, this is not a required sample size, but rather an estimate of how much data you need to reach your stopping criteria.
A plot with example of such simulations is attached. Blue line is fact given current data, red lines are simulations. To estimate additional amount of data necessary to reach required level of certainty it is possible to find this amount for each simulation (for each red line find N, where red line crosses horizontal dash line). A distribution of these values provides an estimate for additional experiment duration (see a histogram).
In principle, for a bayesian approach it should be possible to produce a sample size estimate similar to frequentist approach. It's interpretation could be "how much data do we need to guess best group correctly with chosen probability if effect size is greater or larger than some chosen value and a specific stopping condition is empoyed?". But I have not done it, and can't point to any reference where it is done.