We re running an A A test after integrating GB to measure GB GrowthBook Users #ask-questions

We're running an A/A test after integrating GB to ...

tall-branch-42668

11/27/2023, 10:42 PM

We're running an A/A test after integrating GB to measure GB's accuracy and confirm correct implementation. We set this experiment to run on our Homepage over the holiday weekend with a simple

page_view

Metric we set up. We're getting a good split of users, set at 50/50. But getting pretty different results on page views between the Control and the Variant even though the UX is identical. Any ideas what might be happening? And why the "Risk of Choosing" percents are so different from each other?

fresh-football-47124

11/27/2023, 10:57 PM

how long as it been running for?

fresh-football-47124

11/27/2023, 10:57 PM

oh, since the 22nd

tall-branch-42668

11/27/2023, 10:57 PM

Yeah about 5 full days at this point

fresh-football-47124

11/27/2023, 10:58 PM

@helpful-application-7107 thoughts? My guess would be that you may have some outlayers in this metric- you could add the winsoriazation to see if that changes things. You can also run it for longer

helpful-application-7107

11/27/2023, 11:03 PM

Risk is described here: https://docs.growthbook.io/statistics/overview#inferential-statistics. In a perfect world, they would look similar in an A/A test, but the risk percentages here make sense given that it looks like the average in your variation is higher than the average in your control. As for that difference in page views... that difference you're seeing might indicate there's a problem, but it's also very consistent with the standard uncertainty that comes with experimentation. Sometimes there are false positives, and in this case the chance to beat control is 93%, which is definitely something that can happen every now and then with an A/A test. To provide additional evidence that this is a false positive you can do the following: • You can repeat the A/A test by going to "Edit Targeting" and at the very bottom choose to create a "new phase" and selecting "re-randomize" if your SDK version supports it. (Note: you cannot just create a new phase, you'll want to ensure you re-randomize using the above flow). • Create and run an entirely new A/A test. If it is a false positive, it's very unlikely you see the same difference again.

👍 1

helpful-application-7107

11/27/2023, 11:04 PM

There's an existing longer discussion of this here:

Hi, there's a long discussion of A/A tests: https://linen.growthbook.io/t/13142527/hi-growthbook-team-having-the-same-issue-as-i-described-here#76332a3f-3c02-4ee5-b01a-b3de79fdc260

I gave answers as
helpful-application-7107
in that thread. Please take a look and let me know if you have follow-up questions.

🙏 1

tall-branch-42668

11/28/2023, 3:30 AM

Great thanks, will review and keep you posted

future-teacher-7046

11/28/2023, 1:08 PM

I'll also echo what Graham said. It's possible you have one user who made 2000 page views (e.g. a bot scraping your site). If that user was randomly placed in your variation it could cause the average to go way up. In the metric settings, you can specify a "cap". Any user with a metric value above the cap will get the value truncated. This limits the impact that extreme outliers can have on your results.

thankyou 1

some-planet-44104

11/28/2023, 7:33 PM

With any A/A test you run, be it conversion rate, revenue per users, page views or anything else, you will never see any statistical tool reporting 50% chance to beat control with 0 uplift. Due to natural variation, you may run your experiment 100 times and each time you will see a different result, Chance to Beat control will range anything from 0 to 100% with equal chance to end up with any value for the Chance to Beat Control (it behaves here as a p-value in the frequentist realm so the distribution of all the values is uniform). As to the uplift, if each time you end your experiment by accumulating a certain number of users in each group, the uplift will be different each time but in a certain range (like from -5 to 5%), which depends on your sample size really, if you don’t have a pre-defined sample size and check your results as the experiments run - it’s free for all, any uplift is possible. And it’s absolutely normal and doesn’t mean the system doesn’t function as intended. Just something to make you think

👍 1

80 Views

Open in Slack

Previous Next