Hello is there a way or a best practice on how long a bayesi GrowthBook Users #announcements

Hello, is there a way or a best practice on how lo...

billowy-horse-43368

03/07/2023, 1:01 PM

Hello, is there a way or a best practice on how long a bayesian based test should run? We had several cases where results completely turned around within a week after the test was running for 4 weeks already. I attached 2 screenshots of an example, where we had 36% chance to beat control (metric: Suggested Jobs Page - Qualified Application) with -3.7% decrease and a week later with only 500 more users per group (2600 to 3100) the chance to beat control went up to 82% with a 10% increase in the same metric. This was just random that we left the test running for another week delivering completely different results and I don't understand how to tell this beforehand, i.e. being sure that the results won't change anymore after a specific runtime.

wooden-country-60054

03/07/2023, 3:42 PM

I am not into bayesian stats but here are some moments to consider. Keep in mind the novelty effect if your testing feature brings up new experience for users. The second thing - applications/pages seems to be a ratio metric and with ratio metrics there is almost always additional layer of complexity. I'd probably just tried to re-calculate the same stats with frequentists approach for comparison.

fresh-football-47124

03/07/2023, 3:55 PM

this is fairly normal, when you chance to beat control is not significant, you can’t draw a meaningful conclusions. If your metric has a high amount of variability, you may never get to a significant result, and the credible interval will stay wide.

billowy-horse-43368

03/07/2023, 4:13 PM

I understand but how do I know if the chance to beat control is significant? Bayesian statistics don't seem to give me anything to have an idea about this. If we use many of these ratio metrics, would you suggest to switch to frequentist statistics?

fresh-football-47124

03/07/2023, 4:14 PM

we use a 95% or 5% as significant thresholds by default

billowy-horse-43368

03/07/2023, 4:18 PM

so I should wait for chance to beat control to reach 95% or 5% before I stop it?

fresh-football-47124

03/07/2023, 4:18 PM

yes, or you’re happy with the risk scores

billowy-horse-43368

03/07/2023, 4:18 PM

I see, thanks

late-dentist-52023

03/08/2023, 3:31 AM

@billowy-horse-43368 the other thing that can be helpful in understanding how much movement your metrics have is to regularly run an “A/A” test. So you are randomly assigning users but both into the control experience. I find this really helpful for both: 1. confirming that experiment assignment systems are working and 2. helping the team see how metrics naturally vary. You could even define a data source that randomly assigns users to fake variants via SQL and not even run an experiment at all if you just want to look at the stats component.

👍 1

wooden-country-60054

03/08/2023, 5:21 AM

Yeah indeed AA testing is a must thing, it helped me finding a few hidden issues. A prominent example would be an empty string for user_id. Oops, turns out it gives you a variation, A/A test helped me to find that out.

billowy-horse-43368

03/08/2023, 7:37 AM

Cool that you mention A/A tets, we also considered those recently. Thanks a lot @wooden-country-60054 and @late-dentist-52023!

billowy-horse-43368

03/17/2023, 9:44 AM

@fresh-football-47124 sorry for getting into this again but we had some cases now where the chance to beat control was at 95% after a short time with a low amount of users per variation, i.e. around 150, and then dropped back to around 80%. As I understood this should usually not happen, right? Is it worth to somehow still calculate a needed sample size with an additional tool or would you suggest to still rely on the chance to beat control?

fresh-football-47124

03/17/2023, 6:19 PM

for metrics with a high variability, you might want to increase the min sample size so you’re not staring at random noise

fresh-football-47124

03/17/2023, 6:20 PM

you can also have policies around not making decisions on results within your min duration period

billowy-horse-43368

03/20/2023, 5:09 PM

@fresh-football-47124 is there any way to automatically calculate variability within GrowthBook? I feel like this would be a very helpful feature to understand when tests become conclusive

fresh-football-47124

03/20/2023, 6:27 PM

@helpful-application-7107 any thoughts on this? ^

helpful-application-7107

03/20/2023, 6:28 PM

Yeah, so a measure of variability would be the cornerstone of our power calculator/general sample size estimator which is a top priority for Q2.

helpful-application-7107

03/20/2023, 6:33 PM

There's not currently a great way to measure this with respect to the experiment of interest, but we want to make it really easy to get this information both in the experiment results and as part of a general purpose tool.

billowy-horse-43368

03/21/2023, 9:07 AM

@helpful-application-7107 thanks for the update, looking forward to it!

77 Views

Open in Slack

Previous Next