rapid-ghost-577509/20/2022, 9:16 AM
is ~4 mio rows (1 day) •
comes in at 2 mio rows •
at 1.8 •
at 16K •
1.2 mio •
- takes about 46 seconds, results in 7K rows •
- takes 4 minutes something, results in 7K rows There is the bottleneck. Query details report 11.2 mio rows read. at some stage Not sure how/why this is such a resource hog, maybe you can spot something.
__userMetric as ( -- Add in the aggregate metric value for each user SELECT d.variation, d.dimension, LEAST(10, SUM(m.value)) as value FROM __distinctUsers d JOIN __metric m ON (m.user_pseudo_id = d.user_pseudo_id) WHERE m.conversion_start >= d.conversion_start AND m.conversion_start <= d.conversion_end GROUP BY variation, dimension, d.user_pseudo_id )
subquery is the one that incorporates the activation metric.
rapid-ghost-577509/20/2022, 1:47 PM
And then do a join on that CTE, the query runs fast again. From 4+ minutes to sub 10s
__activatedUsers as ( SELECT initial.user_pseudo_id, t0.conversion_start as conversion_start, t0.conversion_end as conversion_end FROM __experiment initial JOIN __activationMetric0 t0 ON (t0.user_pseudo_id = initial.user_pseudo_id) WHERE t0.conversion_start >= initial.conversion_start AND t0.conversion_start <= initial.conversion_end ), __activatedAgg as ( select user_pseudo_id, min(conversion_start) as conversion_start, min(conversion_end) as conversion_end from __activatedUsers group by 1 )
__distinctUsers as ( -- One row per user/dimension SELECT e.user_pseudo_id, cast('All' as string) as dimension, ( CASE WHEN count(distinct e.variation) > 1 THEN '__multiple__' ELSE max(e.variation) END ) as variation, MIN(a.conversion_start) as conversion_start, MIN(a.conversion_end) as conversion_end FROM __experiment e JOIN __activatedAgg a using(user_pseudo_id) GROUP BY e.user_pseudo_id )
__activatedUsers as ( SELECT initial.user_pseudo_id, min(t0.conversion_start) as conversion_start, min(t0.conversion_end) as conversion_end, FROM __experiment initial JOIN __activationMetric0 t0 ON (t0.user_pseudo_id = initial.user_pseudo_id) WHERE t0.conversion_start >= initial.conversion_start AND t0.conversion_start <= initial.conversion_end group by 1 ),
could work at least some of the time. We recently added support for different Attribution Models and some of them need access to the individual activation events.