To evaluate the reliability of A B testing and metrics signi GrowthBook Users #experimentation

To evaluate the reliability of A/B testing and met...

few-france-17141

08/12/2024, 1:31 PM

To evaluate the reliability of A/B testing and metrics significance calculation, we have been running A/A experiments. (using bayesian analysis) • One of these experiments sends the experiment viewed even in on enter screen of our app's Splash screen • We have about 10 core metrics thats being tracked as part of this experiment • 3/10 metrics have reached significance. My questions are these • is it normal to get 3/10 metrics skew in win/loss significantly? Even after a period of 3 months with close to half a million in overall traffic? • How do we calculate the probability for false positives? Could you share the math with us Thank you

helpful-application-7107

08/12/2024, 5:07 PM

Hey Mogra!

• How do we calculate the probability for false positives? Could you share the math with us

The math can be a little bit complicated because it really depends on the correlation between your metrics. If your threshold for stat sig is 95%, then there is a 10% false positive rate for a single metric since <5% and >95% are stat sig. So if your metrics are totally uncorrelated, the probability of 3-10 of 10 metrics metrics being stat sig comes from the CDF of the binomial distribution (https://en.wikipedia.org/wiki/Binomial_distribution#Cumulative_distribution_function), which you can compute here: https://stattrek.com/online-calculator/binomial with the pr success = 0.1, number of trials as 10, and number of successes as 3. So in this case it would be around 7% probability of getting 3+ metrics that are stat sig. HOWEVER, this math changes dramatically if your metrics are correlated, which they always are in practice. If the metrics are similar, then there's a likelihood that you're actually running fewer than 10 independent tests.

• is it normal to get 3/10 metrics skew in win/loss significantly? Even after a period of 3 months with close to half a million in overall traffic?

I suggest you read: https://docs.growthbook.io/kb/experiments/aa-tests#problem-metrics-show-statistically-significant-lifts-in-the-aa-test Two notes: • 3/10 is enough that it may be concerning, and restarting the A/A test can be helpful, but it's not impossible in an A/A test, especially if those metrics are correlated • The N of traffic doesn't really matter that much for A/A test false positive rates

🙏 1

65 Views

Open in Slack

Previous Next