:wave: Hi! I'm interested in exploring making a co...
# contributing
šŸ‘‹ Hi! I'm interested in exploring making a contribution to
. Specifically, I wanted to implement clustered standard errors into the sample mean tests ([formula 11 from the Duflo Glennerster Kremer 2006 paper](https://www.nber.org/system/files/working_papers/t0333/t0333.pdf)). I saw some overall guidelines in
, and I think I've zeroed in on the file where I should be making my edits - I think it's [in
](https://github.com/growthbook/growthbook/blob/main/packages/stats/gbstats/shared/models.py#L37-L51)). But I wanted to check in with someone (@helpful-application-7107) before I just got to coding. My main confusion is on where the data gets processed so that we can identify that the unit of analysis != unit of randomization.
aw damn, markdown hyperlinks don't work šŸ˜­
Hi Angela! The main barrier with implementing clustered SEs in GrowthBook is that you'll need the within-cluster covariances, which will require additional self-joins in the SQL that we execute. Implementing the estimator in
itself may be less painful (you'll need entirely new
objects that take additional summary stats as their constructor values), but then hooking it up to GrowthBook will require more fundamental changes.
šŸ˜¬ 1
The discussion on page 15 here: https://economics.mit.edu/sites/default/files/2022-09/When%20Should%20You%20Adjust%20Standard%20Errors%20for%20Clustering.pdf is a good overview of the estimator we'd need to implement from the other econ heads in this space šŸ™‚
Ayyyy I was worried about that - namely changes to any SQL, which could be expensive or take a while to execute. This actually raises a couple questions I had: ā€¢ When we've implemented clustered SEs manually, we've run into some uncertainty about how to define e.g. the
, since clusters will vary in their variances. We settled on just choosing the average variance from a sample of clusters (since computing the variances for all clusters often takes forever). ā€¢ This is more a theoretical question, which is how do we treat the intracluster correlation given that clusters can evolve - gaining and losing members over time, and also overlapping in members. Current plan was... ignoring this. šŸ˜…
Oh yeah, and good call on the Abadie et al. paper - maybe that'll give some guidance on this question altogether. I feel like we should be correcting for cluster ICCs in some way... šŸ¤”
None of the standard estimators I'm familiar with require you to bring estimates from outside of your data sample to bear on the variance estimator, although I think Abadie et al do wade into that a bit. As for the clusters changing over time... I'm really not sure anyone has a firm grasp on that. Any time you're taking group membership from time
and then rolling up data into time
when group membership could have changed you're getting into shaky ground. I think you have to assume that their original cluster is the cluster you care about, although it can get worse and worse in the case of online a/b testing where exposure mappings are not just 0/1 because of joining new clusters.
I know in the past lyft has had to explicitly deal with these issues as they become interference issues in their spatial randomization.
Oh right - I wasn't thinking of grabbing cluster variance outside of the sample. Sometimes our samples themselves are just really big, and so it was taking a while to calculate the ICC. And yeah re: malleable clusters. This is what my other econ friends have basically said. I guess it's OK, if we assume that the average nature of our clusters doesn't change, even if they reorganize themselves, etc. So then average treatment effects would still hold. V interesting re: Lyft - I don't think I know anyone there. I wonder if there's Medium posts floating around.
Yeah they have a series on interference and network effects from 2016 (that I used to be very interested in when I thought anyone cared about network effects; turns out they don't unless you have to do spatial randomization like ridesharing companies do)
Even FB invested heavily in inference in the presence of interference from a research and tooling perspective, but then it wasn't widely adopted internally from what i recall.
šŸ˜­ 1
FWIW, in my grad school days I wrote a lot of notes about cluster robust SEs for
, a package I worked on back then: https://declaredesign.org/r/estimatr/articles/mathematical-notes.html#cluster-robust-variance-and-degrees-of-freedom I can't vouch for my clarity of thinking on the issue tat the time, but there's some more references in there.
Covariance matrices scare me, but I will brave the link - thanks!
Specifically, I wanted to implement clustered standard errors
You've done this to yourself, Angela šŸ™‚
šŸ˜­ 1
Sometimes our samples themselves are just really big, and so it was taking a while to calculate the ICC.
What was the process you were using?
relatedly and, not to pile on, but Netflix has a very interesting paper about "compression" about the sufficient summary statistics one needs for these estimators here: https://arxiv.org/abs/2102.11297 This paper and their subsequent papers on their extensible experimentation platform: https://arxiv.org/abs/1910.03878 are models for me.
Suuper interesting, thanks! Yes, that's what I'm after: the bare minimum summary stats so we can save on compute and time. Some of my colleagues did the actual implementation, but iirc they had a dataset that was
cluster_id, user_id, metric_value
and they wanted to calculate the overall variance (fine), the between-cluster variance (also fine), and then the within-cluster variance (slow). They were using R, I could probably dig up the notebook and DM some excerpts. But I think it may have been a loop through each
to calculate that group's var. AKA I don't think a covariance matrix was used.
Hmm, yeah no need to dig it up, but yeah passing that into R is very likely to end up in pain. We want to do as much of the compression in SQL as possible.