Ahoi GrowthBook-people, I have a question regardi...
# ask-questions
m
Ahoi GrowthBook-people, I have a question regarding the feature for CUPED. Our users at idealo evaluated a test with and without CUPED. We observed a change in the relative difference between the means of the baseline and variant in the visualisation (see the two attached screenshots). This surprised our users, and I can’t really confidently explain why this happens. We use frequentist statistics. From your documentation and looking at the original paper linked therein, I understood that CUPED does not change the estimate for the absolute difference between the observed group means. It does, however, change the estimate for the group means themselves. Hence, can you explain why the relative difference changes, i.e. what you normalise the relative difference on? We have even seen trends being inverted due to this behaviour. I’ll link @fresh-football-47124 and @helpful-application-7107 from GrowthBook, and @adventurous-dream-15065 from our side.
h
First of all, those are huge decreases in your confidence interval widths, so CUPED is doing a ton of work to reduce variance and this seems like a really good use case for it. CUPED absolutely changes the difference in variation means. By changing the estimate for the group means, we necessarily also change the estimate for the difference in those means (unless, by some very random chance, the change in the group means cancels out across both groups to some rounding error). In the original paper, you can see
\Delta_cv
is the CUPED lift estimate, and
\Delta_cv
is going to be different from
\Delta
(introduced right below equation 1 in section 2.1 in their paper). This is because the raw variation averages (in their paper written as
Ybar
) are different from the adjusted variation averages (in their paper written as
Yhat_cv
). That's just to help you see how the quantities are different in their paper. Intuitively, in order for CUPED to work, our estimate of the lift itself has to move. Think of it this way: in some respects CUPED reduces variance by eliminating "improbable" randomizations. If you end up with a randomization (that works well) but for some reason puts slightly less active users in your test variation, then CUPED uses that pre-experiment knowledge that these were less active users to reduce that imbalance's impact on the estimate. That would mean shifting the estimate to account for this chance imbalance. Because this imbalance is due to chance, you're correct that on average, across many runs of the same experiment, the CUPED estimate would be on average the same as the non-CUPED estimate. However, in a single run, the estimate will always change. (BTW, this applies to whether you use relative or absolute effects, because the shift occurs in the variation averages, which is used for both estimates.)
👍 1
In our documentation, you can basically think of it as CUPED removing some chance imbalance from
\mu_T
and
\mu_C
and that affects both the relative and absolute effects
\Delta_r
and
\Delta_a
, respectively.
Furthermore just for completeness, CUPED doesn't just work by removing "chance imbalance". If you had 0 chance imbalance, you'd see the lift estimates not move across CUPED and non-CUPED estimates (of course, in practice there's almost always at least some tiny non-zero imbalance), but you would still see variance reduction. This is because CUPED works to reduce variance of the means (
\sigma_T
and
\sigma_c
in our notation above) by understanding how much of the variance is explained by fixed factors (e.g. the pre-experiment values).
m
Thanks for the explanation! I think I mixed up a single realisation of the estimator with its expected value. To make sure I understood it, could you please double check whether the math ion the attached picture is correct? It’s very brief. Assuming my scribble is correct, I have another question from the user’s perspective. How should one interpret the corrected relative uplift? Formally, it describes the relative uplift of the corrected group means. Our users, however, tend to interpret it naively as the uplift of the uncorrected means, that are stated in the Experiment Analysis panel. This corrected uplift of n% is then reported as “we observed an uplift on n% in our metric”, which is wrong.
h
Yes, your math looks right and is a clear exposition!
We use the unadjusted mean in the denominator for the lift so the interpretation you write in quotes at the bottom should be correct. It is an n% uplift in your metric.
👍 1
Yes, your math looks right and is a clear exposition!
I probably will update our CUPED documentation now that we updated the statistics details docs (the one I linked above). Would you mind if I borrowed a bit from your exposition because I think it's clear and I could add it to a FAQ on the CUEPD page.
m
Ahoi Luke, feel free to take from the exposition what you need for your FAQ.
Ahoi @helpful-application-7107, we had a brief internal discussion about the the topic from the product perspective. CUPED is a great feature, that should rather be used than ignored. Unfortunately, it’s not available for, e.g., ratio metrics, yet. We know that it’s a bit harder for these cases, but were wondering whether you plan on implementing a variance reduction method for these cases as well. Do you have anything planned for this?
Ahoi @future-teacher-7046 just to touch this topic again. Is there any plan to extend CUPED to ratio metrics? Just knowing about this would help us phrasing communication to our internal users. Feel free to point me to an existing document, I might have missed it 😉
h
Hey Peter, sorry I meant to respond to this. From my last investigation, ratio metric CUPED requires an additional unit level self-join to do well, which can be a prohibitively slow query at scale. We are looking in to using experiment dimensions to add post-stratification to our CUPED estimator to reduce variance for both mean and ratio metrics.
m
Thanks for following up! We will reach out with some lead-time, when the topics gets more traction in our organisation. Looking forward to your implementation and nevertheless, though 😉 FYI @adventurous-dream-15065