Hey, I’m interested how others are setting up metr...
# experimentation
p
Hey, I’m interested how others are setting up metrics around user retention? We track a lot of events and having something like
Copy code
SELECT user_id, timestamp FROM events
would result in a very costly query that basically scans through all of our events in the experiment window. For our own reporting we aggregate events on a daily basis (= was this user active on this day?) to reduce the amount of data to be queried. I guess using these aggregate tables is also the approach we should take here, but not sure how well it works with the metric delays as these work with timestamps rather than dates? 🤔
b
hey @plain-pharmacist-9564, which kind of retention are you trying to calculate? At our company, we define a retaining customer after X months that customer that after X months is still in our paying customers baseline. The metric we use has this shape:
Copy code
SELECT
account_uid,
DATEADD(month, 1, converted_at::DATE) AS timestamp
FROM account_funnel_table
WHERE is_customer_1m IS TRUE
So basically all the scanning is performed as you said in
account_funnel_table
(which has one row per account), and Growthbook query is pretty quick indeed. In general we apply this approach of pre-calculating and aggregating (if possible) stuff in dbt models, so that Growthbook doesn't have to do the heavy query. For instance, when we have to deal with interactions with our website (users may fire thousands of events each day), we aggregate on a user and daily level before calculating the metric in Growthbook. Hope it helps!
🙏 1
p
Mostly “Did a user return after X days?” So not based on the payment status but rather based on usage Thanks for sharing! We also use dbt to build aggregated table. I will fiddle around a bit to see if that approach works also for us 😊
b
yeah I would use the same approach: pre-aggregating the data on a daily level so that you don't overload Growthbook queries!
🙌 1
h
Hey Lukas! Luke from GrowthBook here. Yeah, pre-aggregating is probably the way to go here. What is the baseline timestamp you use for “Did a user return after X days?”? Experiment exposure, first visit in the events table, or something else?
f
👀 1
🙏 1
p
What is the baseline timestamp you use for “Did a user return after X days?“? Experiment exposure, first visit in the events table, or something else?
@helpful-application-7107 I currently see two scenarios: 1. experiment exposure, when looking at all users 2. signup timestamp (similar to your suggested first visit to the events table), when looking specifically at new users
@helpful-application-7107 So I tried this out with an aggregated table that adds for every user that was active a row and created a metric with a delay of 7 day to get “Returned users 7 days after experiment exposure”. This works fine, but I was wondering if there is a better way of filtering the denominator than creating a “users since >=7 days in experiment” metric?
If you select the “All Experiment Users” as denominator this part of the generated query should be changed to reflect that not necessarily all users are “long” enough in the experiment:
Copy code
__distinctUsers AS (
        SELECT
          userid,
          cast('All' as varchar) AS dimension,
          variation,
          first_activation_timestamp AS timestamp,
          date_trunc('day', first_exposure_timestamp) AS first_exposure_date
          
        FROM __experimentUnits
        WHERE date_diff('hour', first_activation_timestamp, current_timestamp) >= 144 -- only filter on users where the activation is >=144 hours ago
      )
Depending on the context exposure might be more relevant than activation..
h
Hey Lukas, that's the right way to do that, in my opinion. As for the denominator, there's an experiment level setting to "Exclude In-Progress Conversions" that I think might solve your "not in the experiment long enough" problem, but that setting has side effects for all metrics and it has implications for query performance if you begin using fact metrics.
🙏 1
💡 1