Hi everyone I aim to recreate the GrowthBook repo...
# ask-questions
b
Hi everyone I aim to recreate the GrowthBook report using my GA4 dataset within BigQuery. Could you explain how GrowthBook counts users for each variant in an experiment? I am using following part in my query to retrieve all users IDs with Growthbook event fired (to get test and variant ID):
Copy code
WITH growthbook_events AS (
 SELECT
   user_pseudo_id AS unique_user_id,
    (CASE WHEN ep.key = 'experiment_id' THEN CAST(ep.value.int_value AS STRING) END) AS experiment_id,
    (CASE WHEN ep.key = 'variation_id' THEN ep.value.int_value END) AS variation_id,
    (CASE WHEN ep.key = 'gb_user_id' THEN CAST(ep.value.int_value AS STRING) END) AS gb_user_id
 FROM
   `mydataset.events_*`,
   UNNEST(event_params) AS ep
 WHERE
   event_name = 'growthBook'
   AND _TABLE_SUFFIX BETWEEN '20250416' AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
 GROUP BY
   unique_user_id, experiment_id,variation_id, gb_user_id
)
And then using the following query to get users who were to be counted as test users based on the defined activation metric.
Copy code
activation_ids AS (
 SELECT
 user_pseudo_id AS unique_user_id

 FROM `mydataset.events_*`

where event_name = 'activation_metric'
AND _TABLE_SUFFIX BETWEEN '20250416' AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
GROUP BY
unique_user_session_id
)
And then join them
Copy code
SELECT
 g.unique_user_id,
 g.experiment_id,
 g.variation_id,
 g.gb_user_id
FROM
 growthbook_events AS g
JOIN
 activation_ids AS act
ON
 g.unique_user_id = act.unique_user_id

WHERE
 g.experiment_id = 'my_experiment'
I'm seeing slightly different unique user counts per variant compared to the Growthbook dashboard. Is counting distinct
user_pseudo_id
the correct approach?
f
we do filter out users who saw both variants
and the timezones might cause some other differences
how far off are they?