wide-jordan-14758
01/24/2023, 9:14 PMSELECT
54 m.user_uuid,
55 m.timestamp as ts,
56 m.value
57 FROM
58 __metric m
59 JOIN __experiment u ON (u.user_uuid = m.user_uuid)
60 WHERE
61 m.timestamp >= u.conversion_start
62 AND m.timestamp <= u.conversion_end
63 GROUP BY
64 m.user_uuid,
65 m.timestamp,
66 m.value
means that multiple rows with the same user_uuid, timestamp and value will be counted as just 1. However, the value that we are counting can sometimes fire multiple times simultaneously but each fire should be counted distinctly. Have you encountered this before/is there a way around grouping by timestamp?
2. It seems that the results of the metrics queries in the UI are sometimes different than the totals calculated in the queries. For example, if you add the
sum(m.value) as m_sum
field to the final select statement in the metric query and run the query, m_sum
is off by 1 compared to what is shown as the total for the variant in the GrowthBook UI. Any idea why this might be?
3. Is the Risk evaluated against the control or the perceived "winner" for each metric? For one of my metrics, the Risk box is empty when the Control cell is selected in the "Risk of Choosing" drop down.helpful-application-7107
01/24/2023, 9:35 PMGROUP BY
when Attribution Model: All Exposures
rather than First Exposure
. This GROUP BY
will dedup conversions when we join the metrics to the exposures table, which will have multiple rows per user. There are definitely ways we can get around this and do this better (e.g. we could create a "conversion ID" on the fly, which is unique to the row in the metric table, and dedupe on that; or some other larger refactor of how we dedupe). Until we could make some change like that, I don't think there's a way to get around this right now without either: (a) changing your Attribution Model to First Exposure
(but this would change your whole analysis so probably not preferred), (b) fudging the timestamps so they are unique (also not that great of a solution since it undermines the quality of your data). That said, I can definitely take a look at changing how we do this deduping on our end to support your use case.
2. I'm not sure about this. What metric type? Do you have screenshots you can share? An off by 1 error here seems a bit confusing to me.
3. Risk is always evaluated for the variant selected in the dropdown immediately below Risk of Choosing
in the column. It is the risk of choosing that selected variant (the expected percent loss given that the selected variant is indeed worse). You might be seeing empty cells because we don't display Risk when the conversion numbers are below the minimum sample size for that metric, or if the effect is outside the min max percent change (both of these are set at the Metric level).wide-jordan-14758
01/24/2023, 10:44 PMFirst Exposure
and set a very large value for Conversion Window
to still capture all the activity. Any reason that this would create problems I'm not aware of? If there's bandwidth on your end to support another deduping method for All Exposures
that would obviously be ideal for us 🙂
2. Sure - I'll DM you screenshots and the SQL
3. When you say "(the expected percent loss given that the selected variant is indeed worse)" - worse than what? It doesn't seem to be the control, because when the control is selected in the drop down we have non-zero risk.helpful-application-7107
01/24/2023, 10:50 PMI think we could change toHmm, yeah I think that should get you most of the way there! Main difference would be if there were gaps for users between exposures, then your solution would add conversions in those gaps to your metric totals. But this definitely seems reasonable.and set a very large value forFirst Exposure
to still capture all the activityConversion Window
Any reason that this would create problems I'm not aware of? If there's bandwidth on your end to support another deduping method forNot sure how quickly we can get to it, but it seems like a definite should-fix.that would obviously be ideal for us 🙂All Exposures
When you say "(the expected percent loss given that the selected variant is indeed worse)" - worse than what?Oh yeah! Good point! For variations, it is the risk relative to the baseline. For the baseline, it is the MAX(risk_relative_to_each_variation), e.g. the highest risk when compared to any variation.
wide-jordan-14758
01/24/2023, 10:55 PMhelpful-application-7107
01/24/2023, 11:02 PM"add conversions in those gaps"Sure. Imagine a 24 hour conversion window. User 1 enters experiment on 2022-01-01 at 00:00 User 1 then converts on 2022-01-03 at 08:00 or whatever User 1 re-enters experiment on 2022-01-05 at 00:00 User 1 then converts on 2022-01-05 at 01:00 With
all exposures
you should get 1 conversion for this user (the second conversion). If you just updated the conversion window to be, say 30 days(!), then you'd end up counting 2 conversions for that user, even though they only converted in that "gap" between their two exposures.wide-jordan-14758
01/24/2023, 11:18 PM