bored-forest-47521
06/03/2022, 7:59 PMuser_id
and user_pseudo_id
identifier set up. It looks like the import is failing because the query that's auto-generated by GrowthBook to pull in experiment data has subbed in the user_pseudo_id
name in the wrong spot. See thread for more details.-- Past Experiments
WITH __exposures0 as (
SELECT
cast('user_id' as string) as exposure_query,
experiment_id,
variation_id,
date_trunc(CAST(timestamp as DATETIME), DAY) as date,
count(distinct user_id) as users
FROM
(
SELECT
user_id as user_id,
assigned_at as timestamp,
experiment_id as experiment_id,
variant_id as variation_id
FROM
REDACTED.cohort_abtests
WHERE
test_on = 'REDACTED'
) e0
WHERE
CAST(timestamp as DATETIME) > DATETIME("2021-06-03 19:54:18")
GROUP BY
experiment_id,
variation_id,
date_trunc(CAST(timestamp as DATETIME), DAY)
),
__exposures1 as (
SELECT
cast('tbl_l3yux92v' as string) as exposure_query,
experiment_id,
variation_id,
date_trunc(CAST(timestamp as DATETIME), DAY) as date,
count(distinct user_pseudo_id) as users
FROM
(
SELECT
user_pseudo_id as user_id,
assigned_at as timestamp,
experiment_id as experiment_id,
variant_id as variation_id
FROM
REDACTED.cohort_abtests
WHERE
test_on = 'REDACTED'
) e1
WHERE
CAST(timestamp as DATETIME) > DATETIME("2021-06-03 19:54:18")
GROUP BY
experiment_id,
variation_id,
date_trunc(CAST(timestamp as DATETIME), DAY)
),
__experiments as (
SELECT
*
FROM
__exposures0
UNION ALL
SELECT
*
FROM
__exposures1
),
__userThresholds as (
SELECT
exposure_query,
experiment_id,
variation_id,
-- It's common for a small number of tracking events to continue coming in
-- long after an experiment ends, so limit to days with enough traffic
max(users) * 0.05 as threshold
FROM
__experiments
WHERE
-- Skip days where a variation got 5 or fewer visitors since it's probably not real traffic
users > 5
GROUP BY
exposure_query,
experiment_id,
variation_id
),
__variations as (
SELECT
d.exposure_query,
d.experiment_id,
d.variation_id,
MIN(d.date) as start_date,
MAX(d.date) as end_date,
SUM(d.users) as users
FROM
__experiments d
JOIN __userThresholds u ON (
d.exposure_query = u.exposure_query
AND d.experiment_id = u.experiment_id
AND d.variation_id = u.variation_id
)
WHERE
d.users > u.threshold
GROUP BY
d.exposure_query,
d.experiment_id,
d.variation_id
)
SELECT
*
FROM
__variations
WHERE
-- Experiment was started recently
date_diff(DATETIME("2022-06-03 19:54:18"), start_date, DAY) < 14
OR -- OR it ran for long enough and had enough users
(
date_diff(end_date, start_date, DAY) >= 14
AND users > 100
AND -- Skip experiments at start of date range since it's likely missing data
date_diff(start_date, DATETIME("2021-06-03 19:54:18"), DAY) > 2
)
ORDER BY
experiment_id ASC,
variation_id ASC
exposures1
CTE has count(distinct user_pseudo_id) as users
when it should be count(distinct user_id) as users
fresh-football-47124
bored-forest-47521
06/06/2022, 2:12 PMadd experiments
and ran into the same query issuefresh-football-47124
bored-forest-47521
06/08/2022, 2:48 PMfresh-football-47124
bored-forest-47521
06/08/2022, 3:32 PM