a

abundant-zebra-16833

10/17/2023, 6:39 PMš Hi! I'm interested in exploring making a contribution to *think* it's [in **@helpful-application-7107**) before I just got to coding. My main confusion is on where the data gets processed so that we can identify that the unit of analysis != unit of randomization.

`gbstats`

. Specifically, I wanted to implement clustered standard errors into the sample mean tests ([formula 11 from the Duflo Glennerster Kremer 2006 paper](https://www.nber.org/system/files/working_papers/t0333/t0333.pdf)). I saw some overall guidelines in `CONTRIBUTING.md`

, and I think I've zeroed in on the file where I should be making my edits - I `gbstats/shared/models.py`

](https://github.com/growthbook/growthbook/blob/main/packages/stats/gbstats/shared/models.py#L37-L51)). But I wanted to check in with someone (aw damn, markdown hyperlinks don't work š

h

helpful-application-7107

10/17/2023, 6:54 PMHi Angela! The main barrier with implementing clustered SEs in GrowthBook is that you'll need the within-cluster covariances, which will require additional self-joins in the SQL that we execute.
Implementing the estimator in

`gbstats`

itself may be less painful (you'll need entirely new `Statistic`

objects that take additional summary stats as their constructor values), but then hooking it up to GrowthBook will require more fundamental changes.š¬ 1

The discussion on page 15 here: https://economics.mit.edu/sites/default/files/2022-09/When%20Should%20You%20Adjust%20Standard%20Errors%20for%20Clustering.pdf
is a good overview of the estimator we'd need to implement from the other econ heads in this space š

a

abundant-zebra-16833

10/17/2023, 7:00 PMAyyyy I was worried about that - namely changes to any SQL, which could be expensive or take a while to execute. This actually raises a couple questions I had:
ā¢ When we've implemented clustered SEs manually, we've run into some uncertainty about how to define e.g. the

`within_cluster_variance`

, since clusters will vary in their variances. We settled on just choosing the average variance from a sample of clusters (since computing the variances for all clusters often takes forever).
ā¢ This is more a theoretical question, which is how do we treat the intracluster correlation given that clusters can evolve - gaining and losing members over time, and also overlapping in members. Current plan was... ignoring this. š
Oh yeah, and good call on the Abadie et al. paper - maybe that'll give some guidance on this question altogether. I feel like we should be correcting for cluster ICCs in *some* way... š¤

h

helpful-application-7107

10/17/2023, 7:04 PMNone of the standard estimators I'm familiar with require you to bring estimates from outside of your data sample to bear on the variance estimator, although I think Abadie et al do wade into that a bit. As for the clusters changing over time... I'm really not sure anyone has a firm grasp on that. Any time you're taking group membership from time

`t`

and then rolling up data into time `t+1`

when group membership could have changed you're getting into shaky ground. I think you have to assume that their original cluster is the cluster you care about, although it can get worse and worse in the case of online a/b testing where exposure mappings are not just 0/1 because of joining new clusters.I know in the past lyft has had to explicitly deal with these issues as they become interference issues in their spatial randomization.

a

abundant-zebra-16833

10/17/2023, 7:07 PMOh right - I wasn't thinking of grabbing cluster variance outside of the sample. Sometimes our samples themselves are just really big, and so it was taking a while to calculate the ICC.
And yeah re: malleable clusters. This is what my other econ friends have basically said. I guess it's OK, if we assume that the *average* nature of our clusters doesn't change, even if they reorganize themselves, etc. So then average treatment effects would still hold.
V interesting re: Lyft - I don't think I know anyone there. I wonder if there's Medium posts floating around.

h

helpful-application-7107

10/17/2023, 7:08 PMYeah they have a series on interference and network effects from 2016 (that I used to be very interested in when I thought anyone cared about network effects; turns out they don't unless you have to do spatial randomization like ridesharing companies do)

Even FB invested heavily in inference in the presence of interference from a research and tooling perspective, but then it wasn't widely adopted internally from what i recall.

š 1

FWIW, in my grad school days I wrote a lot of notes about cluster robust SEs for

`estimatr`

, a package I worked on back then: https://declaredesign.org/r/estimatr/articles/mathematical-notes.html#cluster-robust-variance-and-degrees-of-freedom
I can't vouch for my clarity of thinking on the issue tat the time, but there's some more references in there.a

abundant-zebra-16833

10/17/2023, 7:14 PMCovariance matrices scare me, but I will brave the link - thanks!

h

helpful-application-7107

10/17/2023, 7:14 PMSpecifically, I wanted to implement clustered standard errorsYou've done this to yourself, Angela š

š 1

Sometimes our samples themselves are just really big, and so it was taking a while to calculate the ICC.What was the process you were using?

relatedly and, not to pile on, but Netflix has a very interesting paper about "compression" about the sufficient summary statistics one needs for these estimators here: https://arxiv.org/abs/2102.11297
This paper and their subsequent papers on their extensible experimentation platform: https://arxiv.org/abs/1910.03878 are models for me.

a

abundant-zebra-16833

10/17/2023, 7:27 PMSuuper interesting, thanks! Yes, that's what I'm after: the bare minimum summary stats so we can save on compute and time.
Some of my colleagues did the actual implementation, but iirc they had a dataset that was

`cluster_id, user_id, metric_value`

and they wanted to calculate the overall variance (fine), the between-cluster variance (also fine), and then the within-cluster variance (slow). They were using R, I could probably dig up the notebook and DM some excerpts. But I think it may have been a loop through each `cluster_id`

to calculate that group's var. AKA I don't think a covariance matrix was used.h

helpful-application-7107

10/17/2023, 8:11 PMHmm, yeah no need to dig it up, but yeah passing that into R is very likely to end up in pain. We want to do as much of the compression in SQL as possible.

