We're just about to end a ~14-day test with 11 var...
# experimentation
l
We're just about to end a ~14-day test with 11 variants (plus Control), and today got alerted to a traffic mismatch. This hasn't shown up on any other tests before and the traffic variance for experiments that have only a few variants is always within an acceptable amount, so it's hard to know if this is an implementation thing or something else. The only advice in the docs is to "review implementation" but that doesn't really get me anywhere. It looks like from 1-12 it slowly goes down in traffic, but it's not consistent, and there's no real way to understand why some would have lower traffic and others wouldn't. Just not sure where to start looking on this one.
I'm not a big fan of having more than just a handful of variants in general but I was overruled on this one 😂
f
heh
11 eh?
are you getting SRM errors?
not sure what you mean by "traffic mismatch"
l
11 yep 😭 Let me know if you have any words of wisdom I can bring back to the team to dissuade this sort of testing in the future 😂
So yeah, we're an SRM warning "*Sample Ratio Mismatch (SRM) detected. P-value below 0.001*." Might this sort of issue be solved by Sticky Bucketing?
I'm curious if it's a data lag between our warehouse and the replica, there was a P-value warning on the weekend, then went away for two days, now it's there again, so it's a bit inconsistent.
h
Our SRM test should work with many groups, but given your pattern of results and fact that the p-value is going back and forth, this may be a false positive. We set our threshold very low to limit false positives, but they're still possible.
The number of variants will hurt your power to detect experiment effects, but they shouldn't increase the possibility of an SRM false positive.