plain-pharmacist-9564
08/13/2024, 7:48 AMSELECT user_id, timestamp FROM events
would result in a very costly query that basically scans through all of our events in the experiment window.
For our own reporting we aggregate events on a daily basis (= was this user active on this day?) to reduce the amount of data to be queried. I guess using these aggregate tables is also the approach we should take here, but not sure how well it works with the metric delays as these work with timestamps rather than dates? 🤔bulky-oil-75691
08/13/2024, 8:44 AMSELECT
account_uid,
DATEADD(month, 1, converted_at::DATE) AS timestamp
FROM account_funnel_table
WHERE is_customer_1m IS TRUE
So basically all the scanning is performed as you said in account_funnel_table
(which has one row per account), and Growthbook query is pretty quick indeed.
In general we apply this approach of pre-calculating and aggregating (if possible) stuff in dbt models, so that Growthbook doesn't have to do the heavy query.
For instance, when we have to deal with interactions with our website (users may fire thousands of events each day), we aggregate on a user and daily level before calculating the metric in Growthbook.
Hope it helps!plain-pharmacist-9564
08/13/2024, 9:20 AMbulky-oil-75691
08/13/2024, 2:30 PMhelpful-application-7107
08/13/2024, 6:03 PMfew-france-17141
08/14/2024, 2:15 AMplain-pharmacist-9564
08/14/2024, 5:51 AMWhat is the baseline timestamp you use for “Did a user return after X days?“? Experiment exposure, first visit in the events table, or something else?@helpful-application-7107 I currently see two scenarios: 1. experiment exposure, when looking at all users 2. signup timestamp (similar to your suggested first visit to the events table), when looking specifically at new users
plain-pharmacist-9564
08/21/2024, 3:19 PMplain-pharmacist-9564
08/21/2024, 4:11 PM__distinctUsers AS (
SELECT
userid,
cast('All' as varchar) AS dimension,
variation,
first_activation_timestamp AS timestamp,
date_trunc('day', first_exposure_timestamp) AS first_exposure_date
FROM __experimentUnits
WHERE date_diff('hour', first_activation_timestamp, current_timestamp) >= 144 -- only filter on users where the activation is >=144 hours ago
)
Depending on the context exposure might be more relevant than activation..helpful-application-7107
08/21/2024, 9:23 PM