rapid-ghost-5775
09/20/2022, 9:16 AM__rawExperiment
is ~4 mio rows (1 day)
• _experiment
comes in at 2 mio rows
• __metric
at 1.8
• __activationMetric0
at 16K
• __activatedUsers
1.2 mio
• __distinctUsers
- takes about 46 seconds, results in 7K rows
• __userMetric
- takes 4 minutes something, results in 7K rows
There is the bottleneck. Query details report 11.2 mio rows read. at some stage
Not sure how/why this is such a resource hog, maybe you can spot something.
__userMetric as (
-- Add in the aggregate metric value for each user
SELECT
d.variation,
d.dimension,
LEAST(10, SUM(m.value)) as value
FROM
__distinctUsers d
JOIN __metric m ON (m.user_pseudo_id = d.user_pseudo_id)
WHERE
m.conversion_start >= d.conversion_start
AND m.conversion_start <= d.conversion_end
GROUP BY
variation,
dimension,
d.user_pseudo_id
)
future-teacher-7046
__distinctUsers
subquery is the one that incorporates the activation metric.rapid-ghost-5775
09/20/2022, 1:47 PMrapid-ghost-5775
09/20/2022, 1:56 PM__activatedUsers as (
SELECT
initial.user_pseudo_id,
t0.conversion_start as conversion_start,
t0.conversion_end as conversion_end
FROM
__experiment initial
JOIN __activationMetric0 t0 ON (t0.user_pseudo_id = initial.user_pseudo_id)
WHERE
t0.conversion_start >= initial.conversion_start
AND t0.conversion_start <= initial.conversion_end
),
__activatedAgg as (
select
user_pseudo_id,
min(conversion_start) as conversion_start,
min(conversion_end) as conversion_end
from __activatedUsers
group by 1
)
And then do a join on that CTE, the query runs fast again. From 4+ minutes to sub 10s
__distinctUsers as (
-- One row per user/dimension
SELECT
e.user_pseudo_id,
cast('All' as string) as dimension,
(
CASE
WHEN count(distinct e.variation) > 1 THEN '__multiple__'
ELSE max(e.variation) END
) as variation,
MIN(a.conversion_start) as conversion_start,
MIN(a.conversion_end) as conversion_end
FROM
__experiment e
JOIN __activatedAgg a using(user_pseudo_id)
GROUP BY
e.user_pseudo_id
)
rapid-ghost-5775
09/20/2022, 2:04 PM__activatedUsers as (
SELECT
initial.user_pseudo_id,
min(t0.conversion_start) as conversion_start,
min(t0.conversion_end) as conversion_end,
FROM
__experiment initial
JOIN __activationMetric0 t0 ON (t0.user_pseudo_id = initial.user_pseudo_id)
WHERE
t0.conversion_start >= initial.conversion_start
AND t0.conversion_start <= initial.conversion_end
group by 1
),
future-teacher-7046
group by
in __activatedUsers
could work at least some of the time. We recently added support for different Attribution Models and some of them need access to the individual activation events.Open source platform for stress free deployments, measured impact, and smarter decisions.
Powered by