melodic-gigabyte-94225
04/02/2025, 4:01 PMOur metric is a proportion:(defined asp_watched
), specifically for videos shown in the first position of algorithmically ranked video playlists.sum (user watched >= 25s of video) / count (user saw video)
The core concern is that our data isn't strictly i.i.d. due to clustering. A small number of specificmake up a large percentage of the impressions for this first playlist position, and these videos have inherently differentvideo_ids
rates (say, ranging from 0.1 for some videos to 0.25 for other videos). This means observations (views) are correlated within eachp_watched
cluster. Often, the control and treatment groups have different videos at the top position.video_id
Classical variance formulas for proportions (like) assume independence and appear to underestimate the true standard error and produce overly narrow confidence intervals in this scenario.p̂(1-p̂)/n
My question is specifically about the standard error calculation for the treatment effect (lift or absolute difference) itself. Does GrowthBook's frequentist or Bayesian engine incorporate adjustments for this type of data clustering (similar to Cluster Robust Standard Errors - CRSE) when calculating the variance/SE for proportion metrics? Or does it primarily rely on the user-level i.i.d. assumption?Have you encountered this specific clustering scenario before with proportion metrics? Any insights into how GrowthBook handles it, or recommended best practices within the platform, are greatly appreciated. Thank you, Teodor