We re just about to end a ~14 day test with 11 variants plus GrowthBook Users #experimentation

We're just about to end a ~14-day test with 11 var...

little-balloon-64875

05/19/2025, 5:05 PM

We're just about to end a ~14-day test with 11 variants (plus Control), and today got alerted to a traffic mismatch. This hasn't shown up on any other tests before and the traffic variance for experiments that have only a few variants is always within an acceptable amount, so it's hard to know if this is an implementation thing or something else. The only advice in the docs is to "review implementation" but that doesn't really get me anywhere. It looks like from 1-12 it slowly goes down in traffic, but it's not consistent, and there's no real way to understand why some would have lower traffic and others wouldn't. Just not sure where to start looking on this one.

little-balloon-64875

05/19/2025, 5:06 PM

I'm not a big fan of having more than just a handful of variants in general but I was overruled on this one 😂

fresh-football-47124

05/20/2025, 11:23 PM

heh

fresh-football-47124

05/20/2025, 11:23 PM

11 eh?

fresh-football-47124

05/20/2025, 11:23 PM

are you getting SRM errors?

fresh-football-47124

05/20/2025, 11:23 PM

not sure what you mean by "traffic mismatch"

little-balloon-64875

05/21/2025, 5:29 AM

11 yep 😭 Let me know if you have any words of wisdom I can bring back to the team to dissuade this sort of testing in the future 😂

little-balloon-64875

05/21/2025, 5:32 AM

So yeah, we're an SRM warning "*Sample Ratio Mismatch (SRM) detected. P-value below 0.001*." Might this sort of issue be solved by Sticky Bucketing?

little-balloon-64875

05/21/2025, 5:33 AM

I'm curious if it's a data lag between our warehouse and the replica, there was a P-value warning on the weekend, then went away for two days, now it's there again, so it's a bit inconsistent.

helpful-application-7107

05/21/2025, 5:43 PM

Our SRM test should work with many groups, but given your pattern of results and fact that the p-value is going back and forth, this may be a false positive. We set our threshold very low to limit false positives, but they're still possible.

helpful-application-7107

05/21/2025, 5:44 PM

The number of variants will hurt your power to detect experiment effects, but they shouldn't increase the possibility of an SRM false positive.

20 Views

Open in Slack

Previous Next