Morning! my company is thinking about implementing...
# ask-questions
a
Morning! my company is thinking about implementing BigQuery as a data source so we can see the experiment results in GB. Is there documentation about how GB queries the BQ? Are the metrics cached? I am trying to get cost estimates for those queries and BQ usage.
h
There isn't currently any metric caching. Because we use a read-only connection and allow for complex combinations of analysis settings, we generate each analysis on demand. Three things to help you out: 1. This useful thread on cost engineering that can help control costs with GrowthBookhttps://growthbookusers.slack.com/archives/C01T6PKD9C3/p1689017930568229?thread_ts=1688546620.616159&cid=C01T6PKD9C3 2. We now add a label to BQ jobs to let you see the queries run by Growthbook: https://docs.growthbook.io/guide/bigquery#monitoring-growthbook-query-cost 3. I'd be happy to share an example query with you, but the basic premise is that time partitioning datasets, ensuring the Experiment Assignment and Metric Queries that you specify leverage this time partitioning whenever possible, and so on, will be critical to controlling costs. It's hard in a general sense to have a cost estimate for these queries. Normally I would recommend running a couple of analyses once you've connected your data source, but seeing as your choosing which warehouse to use, I'm not sure I have good advice here. I would love if there's anyone in the community that could share their experience about choices of data warehouse for GrowthBook integrations.
🙌 1
a
thanks for these detailed notes. So when is the actual query executed? When I am looking at the results? Or daily on cron or similar? Basically what if we don't run any experiments for a while, would GB still ping BQ for any reason?
h
Basically what if we don't run any experiments for a while, would GB still ping BQ for any reason?
No. We basically just run a couple of tiny queries
SELECT 1
and of the
INFORMATION_SCHEMA
to build the information schema when you connect your datasource, and then otherwise we only run queries under 2 conditions: 1. You click "Update data" or "Run all queries" in the app, or rerun an ad-hoc report, which triggers a query. This would be initiated by a user in app and normally is pretty obvious. 2. You can set in your Settings -> General page how often experiments in the
Running
state re-execute queries. So you can turn this off, control it, and it only runs if the experiment is in the
Running
state.
a
ok great, that's helpful