Hi everyone I m using Growthbook with Clickhouse for A B tes GrowthBook Users #ask-questions

Hi everyone, I'm using Growthbook with Clickhouse ...

helpful-alligator-75316

11/28/2023, 9:26 PM

Hi everyone, I'm using Growthbook with Clickhouse for A/B testing analysis and my queries often time out or error out due to memory usage. Are there any tips/tricks to preventing this? I've tried creating Projections (read indexes) that sort by experiment_id and timestamp but it looks like all the JOINs that Growthbook performs causes the memory issues.

brief-honey-45610

11/28/2023, 9:40 PM

Hi Luke, could you send the SQL you’re using for the queries that are timing out?

helpful-alligator-75316

11/28/2023, 9:48 PM

Sure! Here's a query directly from Growthbook:

Copy code

-- Time To First Byte (duration)
    WITH
      
      
    
    __rawExperiment AS (
      SELECT
  host,
  anonymous_id,
  timestamp as timestamp,
  experiment_id as experiment_id,
  variant_id as variation_id
FROM
  events
WHERE
  event = 'experiment_viewed'
    ),
    __experimentExposures AS (
      -- Viewed Experiment
      SELECT
        e.anonymous_id as anonymous_id
        , toString(e.variation_id) as variation
        , e.timestamp as timestamp
        
      FROM
          __rawExperiment e
      WHERE
          e.experiment_id = '63f9e184aaaf29ffe3925b6855490506'
          AND e.timestamp >= toDateTime('2023-10-14 18:00:00', 'UTC')
          AND e.timestamp <= toDateTime('2023-11-28 16:33:51', 'UTC')
          
    )
    , __activationMetric as (-- Metric (RB Analytics - Is Likely Human)
      SELECT
        anonymous_id as anonymous_id,
        1 as value,
        m.timestamp as timestamp
      FROM
        (
              SELECT
  host,
  anonymous_id,
  timestamp
FROM events
WHERE bot_score > 29 
  and context_user_agent not like '%bot%'
  and context_user_agent not like '%Bot%'
            ) m
        
        WHERE m.timestamp >= toDateTime('2023-10-14 18:00:00', 'UTC') AND m.timestamp <= toDateTime('2023-12-28 16:33:51', 'UTC')
    )
        
    
    
    , __experimentUnits AS (
      -- One row per user
      SELECT
        e.anonymous_id AS anonymous_id
        , if(count(distinct e.variation) > 1, '__multiple__', max(e.variation)) AS variation
        , MIN(e.timestamp) AS first_exposure_timestamp
        
        
        
        , MIN(if(
      a.timestamp >= e.timestamp
      , a.timestamp, NULL)) AS first_activation_timestamp
            
      FROM
        __experimentExposures e
        
        
        LEFT JOIN __activationMetric a ON (a.anonymous_id = e.anonymous_id)
      
      GROUP BY
        e.anonymous_id
    ),
      __distinctUsers AS (
        SELECT
          anonymous_id,
          toString('All') AS dimension,
          variation,
          first_activation_timestamp AS timestamp,
          dateTrunc('day', first_exposure_timestamp) AS first_exposure_date
          
        FROM __experimentUnits
        WHERE first_activation_timestamp IS NOT NULL
      )
      , __metric as (-- Metric (Time To First Byte)
      SELECT
        anonymous_id as anonymous_id,
        m.value as value,
        m.timestamp as timestamp
      FROM
        (
              SELECT
  host,
  anonymous_id,
  timestamp,
  (value/1000) as value -- It's looking for seconds so we divide by 1000
FROM events
WHERE event = 'ttfb'
            ) m
        
        WHERE m.timestamp >= toDateTime('2023-10-14 18:00:00', 'UTC') AND m.timestamp <= toDateTime('2023-11-28 16:33:51', 'UTC')
    )
      , __denominator0 as (-- Metric (RB Analytics - Is TTFB session)
      SELECT
        anonymous_id as anonymous_id,
        1 as value,
        m.timestamp as timestamp
      FROM
        (
              SELECT
  host,
  anonymous_id,
  timestamp
FROM events
WHERE event = 'ttfb'
            ) m
        
        WHERE m.timestamp >= toDateTime('2023-10-14 18:00:00', 'UTC') AND m.timestamp <= toDateTime('2023-11-28 16:33:51', 'UTC')
    )
      , __denominatorUsers as (
      -- one row per user
      SELECT
        initial.anonymous_id AS anonymous_id,
        MIN(initial.dimension) AS dimension,
        MIN(initial.variation) AS variation,
        MIN(initial.first_exposure_date) AS first_exposure_date,
        
        MIN(t0.timestamp) AS timestamp
      FROM
        __distinctUsers initial
        JOIN __denominator0 t0 ON (
            t0.anonymous_id = initial.anonymous_id
          )
      WHERE
        
      t0.timestamp >= initial.timestamp
      
      GROUP BY
        initial.anonymous_id)
      
      , __userMetricJoin as (
        SELECT
          d.variation AS variation,
          d.dimension AS dimension,
          
          d.anonymous_id AS anonymous_id,
          if(
        
      m.timestamp >= d.timestamp
      
        
      , m.value, NULL) as value
        FROM
          __denominatorUsers d
        LEFT JOIN __metric m ON (
          m.anonymous_id = d.anonymous_id
        )
        
      )
      , __userMetricAgg as (
        -- Add in the aggregate metric value for each user
        SELECT
          variation,
          dimension,
          
          anonymous_id,
          AVG(value) as value
        FROM
          __userMetricJoin
        GROUP BY
          variation,
          dimension,
          
          anonymous_id
      )
      
        , __capValue AS (
            
      SELECT quantileDeterministic(0.75)(value, 1234) AS cap_value
      FROM __userMetricAgg
      WHERE value IS NOT NULL
    
        )
        
      
      
      -- One row per variation/dimension with aggregations
      SELECT
        m.variation AS variation,
        m.dimension AS dimension,
        COUNT(*) AS users,
        'mean' as statistic_type,
        'count' as main_metric_type,
        MAX(COALESCE(cap.cap_value, 0)) as main_cap_value,
        SUM(LEAST(
        toFloat64(COALESCE(m.value, 0)),
        cap.cap_value
      )) AS main_sum,
        SUM(POWER(LEAST(
        toFloat64(COALESCE(m.value, 0)),
        cap.cap_value
      ), 2)) AS main_sum_squares
        
        
      FROM
        __userMetricAgg m
      
      
      CROSS JOIN __capValue cap
      
      GROUP BY
        m.variation,
        m.dimension

brief-honey-45610

11/28/2023, 10:05 PM

Thanks! Could you tell me more about the metric(s) you're analyzing?

brief-honey-45610

11/28/2023, 10:06 PM

I'm not sure it's related to the timeouts, but could you confirm this part of the setup documentation was completed?

helpful-alligator-75316

11/28/2023, 10:06 PM

this query is for Time To First Byte - its a web performance metric based on how long it takes the server to respond (one of our gaurdrail metrics actually). But I have 7 different metrics that we are tracking per experiment.

helpful-alligator-75316

11/28/2023, 10:09 PM

Yes, that part of the documentation is complete. The connection to the database works, i can get metrics, and experiments fine. Its just that now that the database is larger the vast majority of the queries are failing.

brief-honey-45610

11/28/2023, 10:16 PM

Ok, gotcha. Thanks for that info. Do you know the size of the data that's stored in the database?

helpful-alligator-75316

11/28/2023, 10:16 PM

Currently at 232 million events

helpful-alligator-75316

11/28/2023, 10:18 PM

If I could modify the growthbook queries to filter the data by "host" it would drastically reduce the amount of data that is queried and likely fix the timeout/memory issues. Is there any way I can modify the growthbook queries like this?

brief-honey-45610

11/28/2023, 10:22 PM

You can definitely edit the Metric queries, yes. Select the Metric you want to edit the SQL for, and look for the "Query Settings" section in the right-side pane. If you have the right user permissions you'll be able to edit the Metric Query SQL.

helpful-alligator-75316

11/28/2023, 10:27 PM

I think that would require a new metric for every experiment if I'm not mistaken. We're running about 40 different experiments and it would be cumbersome to create all 8 metrics for each experiment. Is it possible to modify the queries at the experiment level so the metrics can remain ambiguous to the datasource?

helpful-application-7107

11/28/2023, 11:51 PM

Some options: • You could consider trying to do the bot filtering in the base exposure query since it looks like the information comes from the same source, rather than using an activation metric. Maybe create a new experiment assignment query that is (anonymous_id without bots). • The only experiment level information you can have at the metric level right now is the

{{ experimentId }}

, you could create some filter at the metric level that uses that template variable in some clever way to create the right filter. I'm not sure what the

host

filter you're imagining is though so I'm not sure. • I'd also just confirm that our query is properly hitting date partitions just to be safe. Some notes: • We have also built

Fact Tables

to help you define metrics in a way that will not have to scan that

events

table multiple times for an analysis like this. While you can set up metric definitions on the same base SQL right now, the work to build performance improvements on top of those fact tables is still ongoing. Ratio metrics are definitely inefficient right now, and we're hoping that the fact table refactor can do a lot to help trim down query costs in that regard.

helpful-alligator-75316

11/30/2023, 10:45 PM

Thank you @helpful-application-7107 for the tips! I added the bot filtering in the

Experiment Assignment Queries

and also as a filter for the fact table. Moving to the fact table definitely helped w/ performance. Only problem now is that the numbers display incorrectly because I cant control certain aspects of the metric. For example - all my events go to the

events

table and write their metrics into the

value

column. Most of the values are in milliseconds but the fact table only lets me set it as

Time (seconds)

. I got around this with normal metrics by multiplying the value by .001 but I don't have the ability to do that in Fact Tables. In addition to this - sometimes the values aren't milliseconds - it might be a measure such as how many pixels an element shifted. Since I can't break up the definition of the

value

column based on the metric it doesn't allow me to see the changes properly.

helpful-application-7107

12/01/2023, 4:01 PM

Hmm, yeah I see the problem here. It's a bit awkward given but maybe you can do something like the following (cc: @future-teacher-7046): Fact Table SQL

Copy code

SELECT
  host,
  anonymous_id,
  timestamp,
  (value/1000) as value_s,
  value as value
FROM events

Columns

value

<- just a normal number

value_s

<- format seconds Filters Then you can add a bunch of filters here for your different

where

clauses Metrics Then you can define your time metrics using the

value_s

column and your other metrics using the

value

column. There's also an option where you use

CASE WHEN

in your Fact Table SQL to define your metric columns but that seems worse.

👍 1

helpful-alligator-75316

12/01/2023, 4:48 PM

ah thats a good idea! I'll do that thanks. One more thing - when i try using Cap User Values on a metric it returns no results regardless of what percetile I use.

helpful-application-7107

12/01/2023, 4:49 PM

Ah! This is a known bug with clickhouse right now: https://github.com/growthbook/growthbook/issues/1881

👀 1

helpful-alligator-75316

12/01/2023, 4:50 PM

ah bummer! Glad it's been identified though. I'll keep an eye on this issue. Thanks again

helpful-application-7107

12/01/2023, 4:53 PM

I hope to resolve it very soon!

🙏 1

helpful-application-7107

12/04/2023, 5:10 PM

Hey, I landed a fix here to use the nondeterministic quantile method. Please give it a shot and let me know if it works for you.

🙌 1

helpful-alligator-75316

12/04/2023, 5:11 PM

wow great news! Thank you! I'll let you know if it's working for me

helpful-alligator-75316

12/04/2023, 5:21 PM

@helpful-application-7107 Looks like I'm still getting no data. Here's the query for reference:

Copy code

-- Revenue (mean)
WITH __rawExperiment AS (
SELECT
	host,
	anonymous_id,
	timestamp as timestamp,
	experiment_id as experiment_id,
	variant_id as variation_id
FROM
	events
WHERE
	event = 'experiment_viewed'
	and context_user_agent not ilike '%bot%'
    ),
    
    __experimentExposures AS (
-- Viewed Experiment
SELECT
	e.anonymous_id as anonymous_id
        ,
	toString(e.variation_id) as variation
        ,
	e.timestamp as timestamp
FROM
	__rawExperiment e
WHERE
	e.experiment_id = '1cbfa0516ae56757d5615ddcfa3e919e'
	AND e.timestamp >= toDateTime('2023-11-01 00:02:00',
	'UTC')
		AND e.timestamp <= toDateTime('2023-12-04 17:15:19',
		'UTC')
       ),
__experimentUnits AS (
-- One row per user
SELECT
	e.anonymous_id AS anonymous_id
        ,
	if(count(distinct e.variation) > 1,
	'__multiple__',
	max(e.variation)) AS variation
        ,
	MIN(e.timestamp) AS first_exposure_timestamp
FROM
	__experimentExposures e
GROUP BY
	e.anonymous_id
    ),
      __distinctUsers AS (
SELECT
	anonymous_id,
	toString('All') AS dimension,
	variation,
	first_exposure_timestamp AS timestamp,
	dateTrunc('day',
	first_exposure_timestamp) AS first_exposure_date
FROM
	__experimentUnits
        
      )
      ,
__metric as (
-- Metric (Revenue)
SELECT
	anonymous_id as anonymous_id,
	m.amount as value,
	m.timestamp as timestamp
FROM
	(
	SELECT
		anonymous_id,
		timestamp,
		host,
		context_user_agent,
		context_device,
		context_browser,
		bot_score,
		event,
		value,
		(value / 1000) value_seconds,
		amount,
		quantity,
		currency
	FROM
		events
            ) m
WHERE
	event = 'checkout_completed'
	AND context_user_agent not ilike '%bot%'
	AND m.timestamp >= toDateTime('2023-11-01 00:02:00',
	'UTC')
		AND m.timestamp <= toDateTime('2023-12-04 17:15:19',
		'UTC')
    ),
    
__userMetricJoin as (
SELECT
	d.variation AS variation,
	d.dimension AS dimension,
	d.anonymous_id AS anonymous_id,
	if(m.timestamp >= d.timestamp, m.value,	NULL) as value
FROM __distinctUsers d
LEFT JOIN __metric m ON	(m.anonymous_id = d.anonymous_id)  
  ),
  
__userMetricAgg as (
-- Add in the aggregate metric value for each user
SELECT
	variation,
	dimension,
	anonymous_id,
	SUM(COALESCE(value, 0)) as value
FROM
	__userMetricJoin
GROUP BY
	variation,
	dimension,
	anonymous_id
),

__capValue AS (
SELECT
	quantile(0.95)(value) AS cap_value
FROM
	__userMetricAgg
WHERE
	value IS NOT NULL
    
        )
-- One row per variation/dimension with aggregations
      SELECT
	m.variation AS variation,
	m.dimension AS dimension,
	COUNT(*) AS users,
	'mean' as statistic_type,
	'count' as main_metric_type,
	MAX(COALESCE(cap.cap_value, 0)) as main_cap_value,
	SUM(LEAST(
        toFloat64(COALESCE(m.value, 0)),
        cap.cap_value
      )) AS main_sum,
	SUM(POWER(LEAST(
        toFloat64(COALESCE(m.value, 0)),
        cap.cap_value
      ), 2)) AS main_sum_squares
FROM
	__userMetricAgg m
CROSS JOIN __capValue cap
GROUP BY
	m.variation,
	m.dimension

helpful-application-7107

12/04/2023, 5:22 PM

What is the result in the View Queries modal showing as the returned data?

helpful-alligator-75316

12/04/2023, 5:23 PM

image.png

helpful-application-7107

12/04/2023, 5:23 PM

Oh, is there no returned

main_cap_value

column?

helpful-application-7107

12/04/2023, 5:23 PM

seems like there should be based on your SQL

helpful-application-7107

12/04/2023, 5:24 PM

Other than that, it looks like things are working fine, it's just there's, for some reason, no values being returned by this part of the query:

Copy code

__metric as (
-- Metric (Revenue)
SELECT
	anonymous_id as anonymous_id,
	m.amount as value,
	m.timestamp as timestamp
FROM
	(
	SELECT
		anonymous_id,
		timestamp,
		host,
		context_user_agent,
		context_device,
		context_browser,
		bot_score,
		event,
		value,
		(value / 1000) value_seconds,
		amount,
		quantity,
		currency
	FROM
		events
            ) m
WHERE
	event = 'checkout_completed'
	AND context_user_agent not ilike '%bot%'
	AND m.timestamp >= toDateTime('2023-11-01 00:02:00',
	'UTC')
		AND m.timestamp <= toDateTime('2023-12-04 17:15:19',
		'UTC')
    ),
    
__userMetricJoin as (
SELECT
	d.variation AS variation,
	d.dimension AS dimension,
	d.anonymous_id AS anonymous_id,
	if(m.timestamp >= d.timestamp, m.value,	NULL) as value
FROM __distinctUsers d
LEFT JOIN __metric m ON	(m.anonymous_id = d.anonymous_id)  
  )

helpful-application-7107

12/04/2023, 5:27 PM

The culprits could be: • That where clause on the

evets

table is dropping all your data for some reason (seems unlikely, but possible). If you just run that CTE, do you see data in your data warehouse? • The join is not finding overlapping anonymous_id values, if for some reason your anonymous_id values aren't the same for users when you are initializing the growthbook SDK in your app and when you're tracking their conversions/other events (not uncommon problem if people aren't properly storing the anonymous ids using the trackingCallback and re-using them for event tracking) • There's some weirdness with the date windows not overlapping, but that seems unlikely given your queries so far and traffic

helpful-alligator-75316

12/04/2023, 6:12 PM

So if I re-run the query and select * from __capValue I get 0.0. If I rerun it and select * from __userMetricAgg I get all the values but a ton of 0 value rows. If I modify the __userMetricAgg to exclude 0's it returns a more appropriate value:

Copy code

__userMetricAgg as (
        SELECT
          variation,
          dimension,
          anonymous_id,
          SUM(COALESCE(value, 0)) as value
        FROM
          __userMetricJoin
      
        GROUP BY
          variation,
          dimension,
          anonymous_id
         HAVING value > 0  -- I added this line
      )

helpful-alligator-75316

12/04/2023, 6:15 PM

this leads me to think that I need to change my metric to filter out 0's

helpful-application-7107

12/04/2023, 6:19 PM

Looks like the 95th percentile is 0!

helpful-alligator-75316

12/04/2023, 6:22 PM

if I don't filter out non-checkout events that appears to be the case. Should the query be filtering out any non-checkout events though?

helpful-application-7107

12/04/2023, 6:24 PM

Well the 95th percentile is currently defined over the user aggregated values in your experiment. So it will include 0s from people in the experiment who have no values on the experiment.

helpful-alligator-75316

12/04/2023, 6:24 PM

i think its because

__distinctUsers

is getting all users in the experiment and its joining

__metric

to that which causes a bunch of 0/null values.

helpful-application-7107

12/04/2023, 6:24 PM

Yes.

helpful-alligator-75316

12/04/2023, 6:25 PM

Is this different than how the non fact table metrics calculate percentiles too? I don't recall this being an issue there

helpful-application-7107

12/04/2023, 6:25 PM

No, it should be the same, unless you entered a custom aggregation we default to doing

SUM(COALESCE(value, 0))

helpful-alligator-75316

12/04/2023, 7:00 PM

Hmm, I just compared the two (metric table vs regular metric) and get different results. It looks like the

SUM(COALESCE(value, 0))

is the only major difference. Its just

SUM(value)

on the regular metric. If I change the fact table query to use

SUM(value)

It works correctly

helpful-application-7107

12/04/2023, 7:01 PM

Is your regular metric set up with a custom aggregation that is

sum(value)

helpful-alligator-75316

12/04/2023, 7:04 PM

Ah - looks like it is custom. I wasn't aware you could leave the "User Value Aggregation" field empty.

helpful-application-7107

12/04/2023, 7:05 PM

Yeah, sorry that's a bit of an oversight in that UI. So that's the main reason this is going on. It's not clear that 95th percentile should exclude 0s, since statistically the percentile is defined over the user aggregate levels, so that's why our default is to include those. You could increase the percentile as a potential solution.

helpful-alligator-75316

12/04/2023, 7:07 PM

I see, only problem I'd have there is that I'd need to create a revenue metric for each experiment since each one of our customer's may have a different "conversion rate" that I'd have to base the percentile off of. Some customers have a 1% conversion rate while others may be upwards of 10%. Would it be possible to add a custom aggregation in the Fact Table so I can do a simple

SUM

there too?

helpful-application-7107

12/04/2023, 7:32 PM

Right. We aren't going to introduce custom aggregations for fact table metrics, but we could consider adding an option to apply the 95th percentile to either all user level values or just to non-zero user level values. Would you be willing to open an issue to that effect?

🙏 1

helpful-alligator-75316

12/04/2023, 7:49 PM

Yeah I can do that! Thanks for the help

helpful-alligator-75316

12/04/2023, 8:18 PM

I added an issue here. Hopefully it's descriptive enough: https://github.com/growthbook/growthbook/issues/1912

thankyou 1

helpful-application-7107

02/28/2024, 2:40 PM

Late to the follow-up here, but I did close this out and the feature is available now, for what it's worth.

🙌 1

helpful-alligator-75316

02/28/2024, 11:31 PM

Great, thanks Luke! I'll check it out and let you know if I run into any issues

88 Views

Open in Slack

Previous Next