Hi all, I'm working on implementing an A/B test w...
# experimentation
l
Hi all, I'm working on implementing an A/B test with a planned 50/50 population split. Both control and variation groups will generate computations, which will then be processed by a downstream service to filter out entries that don't meet business rules We anticipate that the control group will be filtered out 1% more than the variation group. As a result, the experiment may be flagged as unhealthy due to an observed imbalance (49/51 instead of the expected 50/50) My question is: since this imbalance is expected, can the experiment results still be considered reliable even if it's marked as unhealthy? Put another way, do the statistical calculations depend on achieving the expected 50/50 split, or is that split simply used as a diagnostic to highlight unexpected behavior ?
s
Hi Tony, I'm a data scientist at GrowthBook. The short answer is that the results may be unreliable. As an example, suppose that power users who spend more money are the ones being filtered out. Even if treatment has no effect, it will look better because you are removing the large spenders from control. Is it possible to have only users that meet business rules enter the experiment, and then equally assign users to control and treatment? Thanks, Luke
l
Hi Luke, Thanks for taking the time to answer The way we assign users to populations determines two distinct calculation approaches. These results are then sent to a downstream service that filters out proposals before they are shown to users. Because of this, we can't apply the business rule filtering before assigning users to control or variation, assignment has to come first. Just to clarify, the filtering is applied to the output of the calculations, not based on user characteristics. Given this setup, does your assessment of the experiment's reliability still apply?
s
Hi Tony, The best approach is to keep all users in the experiment, otherwise results will be biased. This provides the best apples-to-apples comparison w/r/t the following scenarios: 1. no users are eligible to receive treatment 2. all users are eligible to receive treatment The first scenario will occur if you decide to rollback your feature. The second scenario will occur if you decide to ship your feature. If only a small percentage of users are make it through the downstream service, your power may be low. If power is low, and the users filtered out by treatment are a subset of the users filtered out by control, then the following approach can increase power. Create a segment of users that would be filtered out by treatment, regardless of whether they were assigned to control or treatment, and analyze this segment. You will want to check that the segment is 50/50 control/treatment. I understand that you may not be able to identify this segment, depending upon how your pipeline is set up, but if you can, it can increase power and provide unbiased results on the customers who would receive (not just be assigned) to treatment. Sorry for the long answer, feel free to ask more questions! Luke
👌 1