rapid-ghost-5775
09/20/2022, 9:16 AM__rawExperiment
is ~4 mio rows (1 day)
• _experiment
comes in at 2 mio rows
• __metric
at 1.8
• __activationMetric0
at 16K
• __activatedUsers
1.2 mio
• __distinctUsers
- takes about 46 seconds, results in 7K rows
• __userMetric
- takes 4 minutes something, results in 7K rows
There is the bottleneck. Query details report 11.2 mio rows read. at some stage
Not sure how/why this is such a resource hog, maybe you can spot something.
__userMetric as (
-- Add in the aggregate metric value for each user
SELECT
d.variation,
d.dimension,
LEAST(10, SUM(m.value)) as value
FROM
__distinctUsers d
JOIN __metric m ON (m.user_pseudo_id = d.user_pseudo_id)
WHERE
m.conversion_start >= d.conversion_start
AND m.conversion_start <= d.conversion_end
GROUP BY
variation,
dimension,
d.user_pseudo_id
)
future-teacher-7046
__distinctUsers
subquery is the one that incorporates the activation metric.rapid-ghost-5775
09/20/2022, 1:47 PM__activatedUsers as (
SELECT
initial.user_pseudo_id,
t0.conversion_start as conversion_start,
t0.conversion_end as conversion_end
FROM
__experiment initial
JOIN __activationMetric0 t0 ON (t0.user_pseudo_id = initial.user_pseudo_id)
WHERE
t0.conversion_start >= initial.conversion_start
AND t0.conversion_start <= initial.conversion_end
),
__activatedAgg as (
select
user_pseudo_id,
min(conversion_start) as conversion_start,
min(conversion_end) as conversion_end
from __activatedUsers
group by 1
)
And then do a join on that CTE, the query runs fast again. From 4+ minutes to sub 10s
__distinctUsers as (
-- One row per user/dimension
SELECT
e.user_pseudo_id,
cast('All' as string) as dimension,
(
CASE
WHEN count(distinct e.variation) > 1 THEN '__multiple__'
ELSE max(e.variation) END
) as variation,
MIN(a.conversion_start) as conversion_start,
MIN(a.conversion_end) as conversion_end
FROM
__experiment e
JOIN __activatedAgg a using(user_pseudo_id)
GROUP BY
e.user_pseudo_id
)
__activatedUsers as (
SELECT
initial.user_pseudo_id,
min(t0.conversion_start) as conversion_start,
min(t0.conversion_end) as conversion_end,
FROM
__experiment initial
JOIN __activationMetric0 t0 ON (t0.user_pseudo_id = initial.user_pseudo_id)
WHERE
t0.conversion_start >= initial.conversion_start
AND t0.conversion_start <= initial.conversion_end
group by 1
),
future-teacher-7046
group by
in __activatedUsers
could work at least some of the time. We recently added support for different Attribution Models and some of them need access to the individual activation events.