We’re running an A/A experiment (random assignment to two variants, but user sees the same product). Typically there’s been small variation for metrics in Growthbook but most of the violin plots are centered at close to 0 now that ~50,000 users have been assigned. Our assignments should be random.
But one of our core binomial metrics is showing a 2.3% percent drop in the “experiment” group. Growthbook computes a 3.68% chance that the variant will beat the control, when I’d expect ~50% chance, since the product is the same. This is a ratio metric, with the denominator set to one of the other states in our funnel. The denominator’s metric shows the expected 0% change. There’s a conversion window on the metric, but I tried copying the metric and 1) setting the denominator to all experiment users and 2) modifying the conversion window, and the percent size / probability distribution is roughly the same.
This experiment contains has been running for 4 months. I created a new experiment phase and looked at just data from the last 1 month, and I wasn’t able to reproduce this. Seeing a 0% percent change for the new phase.
I had a few questions, and I’m open to any advice people have on interpreting the results of A/A tests in Growthbook:
• What could help explain this outcome? Are these false positives expected, even after such a long experiment length?
• Could this suggest a flaw in our assignment logic or our metric definition in any way? Again, the difference in most other metrics is close to 0, and the second phase showed a 0% difference, so the first result (on 4x the users) is confusing.
• Does this suggest that I should be skeptical of any experiment result that moves the metric less than ~2.3%? Is the A/A test helping me understand the underlying natural variance for the metric in any way?