Hi there I have few questions on metrics calculations 1 It s GrowthBook Users #announcements

Hi there, I have few questions on metrics calculat...

wide-jordan-14758

01/24/2023, 9:14 PM

Hi there, I have few questions on metrics calculations: 1. It seems like this part of the metric query

Copy code

SELECT
54    m.user_uuid,
55    m.timestamp as ts,
56    m.value
57  FROM
58    __metric m
59    JOIN __experiment u ON (u.user_uuid = m.user_uuid)
60  WHERE
61    m.timestamp >= u.conversion_start
62    AND m.timestamp <= u.conversion_end
63  GROUP BY
64    m.user_uuid,
65    m.timestamp,
66    m.value

means that multiple rows with the same user_uuid, timestamp and value will be counted as just 1. However, the value that we are counting can sometimes fire multiple times simultaneously but each fire should be counted distinctly. Have you encountered this before/is there a way around grouping by timestamp? 2. It seems that the results of the metrics queries in the UI are sometimes different than the totals calculated in the queries. For example, if you add the

Copy code

sum(m.value) as m_sum

field to the final select statement in the metric query and run the query,

m_sum

is off by 1 compared to what is shown as the total for the variant in the GrowthBook UI. Any idea why this might be? 3. Is the Risk evaluated against the control or the perceived "winner" for each metric? For one of my metrics, the Risk box is empty when the Control cell is selected in the "Risk of Choosing" drop down.

wide-jordan-14758

01/24/2023, 9:14 PM

cc @ambitious-apartment-58735

helpful-application-7107

01/24/2023, 9:35 PM

Hi @wide-jordan-14758! 1. Just to recap: we do this

GROUP BY

when

Attribution Model: All Exposures

rather than

First Exposure

. This

GROUP BY

will dedup conversions when we join the metrics to the exposures table, which will have multiple rows per user. There are definitely ways we can get around this and do this better (e.g. we could create a "conversion ID" on the fly, which is unique to the row in the metric table, and dedupe on that; or some other larger refactor of how we dedupe). Until we could make some change like that, I don't think there's a way to get around this right now without either: (a) changing your Attribution Model to

First Exposure

(but this would change your whole analysis so probably not preferred), (b) fudging the timestamps so they are unique (also not that great of a solution since it undermines the quality of your data). That said, I can definitely take a look at changing how we do this deduping on our end to support your use case. 2. I'm not sure about this. What metric type? Do you have screenshots you can share? An off by 1 error here seems a bit confusing to me. 3. Risk is always evaluated for the variant selected in the dropdown immediately below

Risk of Choosing

in the column. It is the risk of choosing that selected variant (the expected percent loss given that the selected variant is indeed worse). You might be seeing empty cells because we don't display Risk when the conversion numbers are below the minimum sample size for that metric, or if the effect is outside the min max percent change (both of these are set at the Metric level).

wide-jordan-14758

01/24/2023, 10:44 PM

Thanks for all the info! 1. I think we could change to

First Exposure

and set a very large value for

Conversion Window

to still capture all the activity. Any reason that this would create problems I'm not aware of? If there's bandwidth on your end to support another deduping method for

All Exposures

that would obviously be ideal for us 🙂 2. Sure - I'll DM you screenshots and the SQL 3. When you say "(the expected percent loss given that the selected variant is indeed worse)" - worse than what? It doesn't seem to be the control, because when the control is selected in the drop down we have non-zero risk.

helpful-application-7107

01/24/2023, 10:50 PM

I think we could change to
First Exposure
and set a very large value for
Conversion Window
to still capture all the activity

Hmm, yeah I think that should get you most of the way there! Main difference would be if there were gaps for users between exposures, then your solution would add conversions in those gaps to your metric totals. But this definitely seems reasonable.

Any reason that this would create problems I'm not aware of? If there's bandwidth on your end to support another deduping method for
All Exposures
that would obviously be ideal for us 🙂

Not sure how quickly we can get to it, but it seems like a definite should-fix.

When you say "(the expected percent loss given that the selected variant is indeed worse)" - worse than what?

Oh yeah! Good point! For variations, it is the risk relative to the baseline. For the baseline, it is the MAX(risk_relative_to_each_variation), e.g. the highest risk when compared to any variation.

wide-jordan-14758

01/24/2023, 10:55 PM

Can you say more about how it would "add conversions in those gaps"? Do you mean that the timeline of the conversions would be incorrect or that the net total conversions would be different?

wide-jordan-14758

01/24/2023, 10:57 PM

Understood for the risk question, thanks! Still wondering why we have an empty box for one of our metrics though. I'll DM you separately

helpful-application-7107

01/24/2023, 11:02 PM

"add conversions in those gaps"

Sure. Imagine a 24 hour conversion window. User 1 enters experiment on 2022-01-01 at 00:00 User 1 then converts on 2022-01-03 at 08:00 or whatever User 1 re-enters experiment on 2022-01-05 at 00:00 User 1 then converts on 2022-01-05 at 01:00 With

all exposures

you should get 1 conversion for this user (the second conversion). If you just updated the conversion window to be, say 30 days(!), then you'd end up counting 2 conversions for that user, even though they only converted in that "gap" between their two exposures.

wide-jordan-14758

01/24/2023, 11:18 PM

Ah ok! Still wrapping my head around how attribution models and conversion windows interact. This is a helpful example to think about how it will work best for our data

96 Views

Open in Slack

Previous Next