Hi, I’m looking for more info about using experime...
# announcements
Hi, I’m looking for more info about using experiment phases. What is their intended use case? It seems like it is possible to add multiple phases to an experiment but what would that do to the data/analysis? All the documentation I’ve found suggests that it just filters out data before the beginning of the phase, so what happens if there are multiple phases? If users are assigned before the beginning of the phase but come back after it starts would all of that user’s data be filtered out, or only the pre-phase data? Can srm checks account for multiple phases with potentially different splits?
Hi Jane - the idea with Phases is that quite often you roll out an experiment, then realize something is wrong, and want to exclude a certain part of the experiment. Alternatively, you could start with a small sample, and perhaps have a different split from the main phase of your experiment, and this way if you change the split you can control the SRM errors
I’ll have to check about the N’th time a user comes back - I know we fire a exposure event every time, just not sure if we include those users based on first exposure or any exposure
@helpful-application-7107 or @future-teacher-7046 do you know?
Yeah suppose we’re in the latter scenario (just want to ramp up, nothing has gone wrong), does adding a phase filter out the earlier smaller-allocation data? That’s what I’m understanding from what I’ve read so far, but wanted to check
Phases essentially only use dates to filter out the old experiment impressions. If a user exposed in phase 1 gets exposed in phase 2, they'll get counted again, but not until they have a new exposure in phase 2. For now, phase is essentially just a date filter, but we're working on building a smarter system that can handle config changes more elegantly. Honestly, if you're just ramping an experiment up, I would recommend not using phases and just pooling all of your data. You won't get an SRM error because the relative bucket percentages will remain the same and you'll keep all your data in an experiment. If you use a new phase, you could be subject to carryover bias. If you change the variation percentages, you could use a new phase to prevent having srm and some bias, but you could still have carryover bias. The reality in this situation is that I would recommend rerandomizing by using a new experiment key. We're working on making this process easier, but rerandomization would still be the safest way to proceed from an analysis perspective.
Very helpful, thanks! And just curious if you did set say three phases in an experiment (not sure why you would given what you just wrote) would the start date in the most recent phase be used as the filter?
not sure why you would given what you just wrote
Carryover bias can be less problematic in some instances than, say, changing users behavior back and forth and or throwing out old data. So there could be some use cases for it, it just would require being pretty careful and you'd still have some unknown bias.
would the start date in the most recent phase be used as the filter?
In the experiment page, whatever phase is selected will use the associated date range as a date filter on the exposure source. And yes, the latest phase will be used as the default.
Ah makes sense