Hello guys! I have a general question over experim...
# announcements
f
Hello guys! I have a general question over experimentation processes... when we ramp-up the volume of users in the experiment (suppose 50% control vs 50% treatment), of 10% of the total user base to 50% (preserving the 1:1 ratio), is it possible we're introducing a temporal bias to the experiment? ā€¢ For example, in a scenario were a user base is highly seasonal, by ramping up in a specific period, we might be oversampling a season, and when looking at the overall results, ATE/CATE, there is an underlying Simpson Paradox. I ask this because of a discussion on my company, there was a team wanted to ramp up to accelerate the results, but our user base is highly seasonal, and then the question we couldn't answer for sure was: Should we ramp up only for validation of a decision taken in the initial design? Or is it valid to ramp up in the middle of the experiment to accelerate the experiment?
f
Hi Victor
f
In GrowthBook, if you change the traffic allocation, it starts a new "phase" of the experiment. Each phase is analyzed independently so we don't mix data from different allocations together, avoiding any Simpson paradoxes.
šŸ™Œ 2
The tradeoff is that you can't accelerate a test by increasing the allocation midway through.
f
Hi Jeremy and Graham, that's very interesting! Actually we've been using another service for A/B testing (not a plataform, but a feature flag with few A/B test capabilities), but we are planning to make an PoC of GrowthBook because all of these features hahaha So as far as I understood, looks like it's not a good practice to ramp-up in the middle of the experiment, do you guys have any technical references explaining this case? Or is it more of a consistency decision of the platform, instead of a real problem
f
We've found that changing allocations mid-test makes it really easy to make statistical errors. We've decided to err on the side of statistical correctness at the moment. In the future we might be able to put enough safeguards in the analysis to make it safe and enable that use case.
f
Thanks for the clarification Jeremy, good to know that šŸ™‚
a
@full-zebra-25719, statistics aside, if you have highly seasonal users and usage patterns, you obviously have to be very careful with test-based conclusions, because they may not be valid the rest of the year.
b
just my 2$ from our experience: This is something we are dealing with as well. we make sure we are very careful in the test design to keep any behavioral temporal differences into account. we always run a test for a minimum of 1 week and otherwise only allow for 2 or 3 full weeks as longer test periods. This is just because we know our customers behave differently on different days in the week. (So from test duration point of view, this is not a sampling bias as the same user behaves differently!)
šŸ‘ 1