freezing-apple-73992
07/05/2023, 8:43 AMwhite-fireman-22476
07/05/2023, 8:55 AMproud-engine-56337
07/05/2023, 8:19 PMfreezing-apple-73992
07/06/2023, 5:27 AMproud-engine-56337
07/10/2023, 7:38 PMfct_hits
) and our new enrollment aggregate table (agg_experiment_enrollment
) were timestamp partitioned. When we built agg_experiment_enrollment
it would only need to scan the last day or so of its source fact table (down from ~2TB -> ~50gb when it queried the source table) and produced maybe 100mb/day of records of new enrollments.
Then we set the enrollment query to use the date-based filtering as part of the where clause. Once again, with date partitioning, this meant each enrollment query would only look at the days in the experiment, rather than the whole table.
Then, for a while, we were building new models in DBT for each and every Growthbook metric. It was cheaper, but bad form. So we made aggregates/reports around some of the more important conversion events (e.g ad impression, registration, pageviews) that were just focused versions of the fact table with just the rows for those events & relevant columns. This might save upwards of 50gb/query of just throwing out the name of the event it would filter on from the upstream fct table.
fct_hits
- agg_experiment_enrollment
- agg_pageview
- agg_registration_event
This is all to say, knowing how to effectively use DBT and your data warehouse can make growthbook work well for overcoming some of those direct query limitations.
(This is probably worth writing clearly as a blog post at some point rather than rushing it as I'm running out the door now 😉 )fresh-football-47124
helpful-application-7107
07/10/2023, 8:02 PMOpen source platform for stress free deployments, measured impact, and smarter decisions.
Powered by