Any tips on optimizing/reducing data processing wi...
# announcements
b
Any tips on optimizing/reducing data processing with BigQuery? We're finding that most typical GB queries end up processing around 2-3GB. Currently we've got it limited to 1TB/day as it gets quite expensive, but we're hitting the cap pretty much every day We're mostly using Mixpanel data (ETL'd by Mixpanel into BQ), and one part of this is needing to resolve the "distinct IDs" to treat multiple IDs as the same user, which involves a left join We've done quite a bit of work trying to optimize it - even seeing if we can incrementally build a copy of the events with dbt, but found it was inefficient due to needing to scan the destination while merging Currently we've got some materialized views clustering by time and mp_event_name + time, which works very well in our basic testing, but the query plans don't seem to be able to take advantage of those in the actual final queries (vaguely wonder if that's due to the casting from timestamp to datetime, or simply the complexity of the queries) Would appreciate if there's anything people can share from their experience using growthbook for similar situations 🙂
f
Are you using the SQL Template variables in your query already? https://docs.growthbook.io/app/datasources#sql-template-variables
b
Oh interesting, that does seem like it would help! I hadn't seen this info
Will have a look at it next week, thank you 🙂