Hi team wave Based on the figures I see here for this revenu GrowthBook Users #ask-questions

Hi team :wave: Based on the figures I see here fo...

victorious-magazine-23782

11/29/2023, 6:12 PM

Hi team 👋 Based on the figures I see here for this revenue metric I would expect a chance to beat control ~ 3.5% (Bayesian engine). However, my UI is showing a CTBC = 50%. Can anyone help me understand what is happening? Thanks

👀 1

victorious-magazine-23782

11/29/2023, 6:44 PM

Also trying a more direct approach gives me a different CTBC

swift-helmet-3648

11/29/2023, 10:00 PM

Our CBTC falls back to 50/50 when there isn't enough data for whatever reason.. Since there are only 3 users this may be what's happening

victorious-magazine-23782

11/30/2023, 8:52 AM

Hi @swift-helmet-3648, thanks for your response. Actually I think that is a bug, as you can see the count is 35,188,840, and that should be the number of users as well. In fact I get users = count in the underlying SQL query, but for some reasons in the UI that shows as the leading 3 only (see images below)

victorious-magazine-23782

11/30/2023, 8:54 AM

If there is actually a bug resulting in users being set to 3, might that cause the 50% CTBC?

swift-helmet-3648

11/30/2023, 2:00 PM

Ah interesting, it's possible. Let me get back to you

thankyou 1

helpful-application-7107

11/30/2023, 3:50 PM

Yes, the problem is with 3 users we are going to run into all kinds of statistical uncertainty that prevents us from providing a more meaningful answer. So there's 2 things going on here: 1. It looks like you have 3 users, and that seems wrong! You might want to figure out what's going on with

id

in your data source! 2. We still need to improve error messaging, my guess is our stats engine is throwing an error because you only have 6 users and returns some default output. (we're working on improving this: https://github.com/growthbook/growthbook/pull/1671) If you go into your jupyter notebook and set the users to 3 you'll see the very empty looking response to

compute_result()

helpful-application-7107

11/30/2023, 3:51 PM

Oh, I see

count

and

users

differ

victorious-magazine-23782

11/30/2023, 3:51 PM

Hey @helpful-application-7107 thanks for jumping in! The point is that I do not have 3 users, it seems like a bug of the UI—see my previous screenshots

helpful-application-7107

11/30/2023, 3:52 PM

The stats engine is going to take what is in the

users

column (I'm confident in this because you're seeing that default 50% response), so it's not a UI bug

victorious-magazine-23782

11/30/2023, 3:52 PM

Yes, if you see the results of the underlying SQL query when I run it in Athena, users = count = 3.5*10**7

helpful-application-7107

11/30/2023, 3:52 PM

Oh man! You look like you're on a very old version of growhbook

victorious-magazine-23782

11/30/2023, 3:52 PM

Yeah I know—blame my devops lol

victorious-magazine-23782

11/30/2023, 3:52 PM

*devops team

helpful-application-7107

11/30/2023, 3:53 PM

Ok, it's going to be very hard to debug what's going on.

helpful-application-7107

11/30/2023, 3:53 PM

Is this a ratio metric?

victorious-magazine-23782

11/30/2023, 3:53 PM

But why are you saying it's not a bug if the UI shows 3, whereas the query shows 3.5*10**7?

helpful-application-7107

11/30/2023, 3:53 PM

(does it have a

Denominator

victorious-magazine-23782

11/30/2023, 3:53 PM

No it's a revenue metric

victorious-magazine-23782

11/30/2023, 3:53 PM

No denominator

victorious-magazine-23782

11/30/2023, 3:54 PM

As simple as that

helpful-application-7107

11/30/2023, 3:54 PM

Yeah, then it's probably a bug, but it's also almost a year old at this point

helpful-application-7107

11/30/2023, 3:54 PM

So it has probably been fixed in that time period. Our queries are totally different now (they don't return means and stddev but rather sums and sums of squares as of Q1 this year)

👍 1

helpful-application-7107

11/30/2023, 3:55 PM

Can you share the full query with me?

victorious-magazine-23782

11/30/2023, 3:55 PM

Yes, I have seen it from you code. In fact if you look up, I have tried to run the experiment in python locally using sum and sums of squares

victorious-magazine-23782

11/30/2023, 3:56 PM

Yes, I think so, let me just check one thing

victorious-magazine-23782

11/30/2023, 3:57 PM

Copy code

-- Gross TTV (revenue)
WITH
  __rawExperiment as (
    SELECT
      id,
      created_at as timestamp,
      experiment as experiment_id,
      variation_id as variation_id
    FROM
      experiments.assignments
  ),
  __experiment as ( -- Viewed Experiment
    SELECT
      e.id as id,
      cast(e.variation_id as varchar) as variation,
      e.timestamp as timestamp,
      e.timestamp as conversion_start,
      e.timestamp + INTERVAL '72' hour as conversion_end
    FROM
      __rawExperiment e
    WHERE
      e.experiment_id = 'my_experiment'
      AND e.timestamp >= from_iso8601_timestamp('2023-11-10T17:30:00.000Z')
  ),
  __metric as ( -- Metric (Gross TTV)
    SELECT
      id as id,
      m.value as value,
      m.timestamp as timestamp,
      m.timestamp as conversion_start,
      m.timestamp as conversion_end
    FROM
      (
        SELECT
          id,
          timestamp,
          itinerary_sell_rate_total.usd_amount AS value
        FROM
          experiments.master
      ) m
    WHERE
      m.timestamp >= from_iso8601_timestamp('2023-11-10T17:30:00.000Z')
  ),
  __distinctUsers as (
    -- One row per user/dimension
    SELECT
      e.id,
      cast('All' as varchar) as dimension,
      (
        CASE
          WHEN count(distinct e.variation) > 1 THEN '__multiple__'
          ELSE max(e.variation)
        END
      ) as variation,
      MIN(e.conversion_start) as conversion_start,
      MIN(e.conversion_end) as conversion_end
    FROM
      __experiment e
    GROUP BY
      e.id
  ),
  __userMetric as (
    -- Add in the aggregate metric value for each user
    SELECT
      d.variation,
      d.dimension,
      d.id,
      LEAST(3500, SUM(value)) as value
    FROM
      __distinctUsers d
      JOIN __metric m ON (m.id = d.id)
    WHERE
      m.timestamp >= d.conversion_start
      AND m.timestamp <= d.conversion_end
    GROUP BY
      variation,
      dimension,
      d.id
  ),
  __overallUsers as (
    -- Number of users in each variation
    SELECT
      variation,
      dimension,
      1.0 * COUNT(*) as users
    FROM
      __distinctUsers
    GROUP BY
      variation,
      dimension
  ),
  __stats as (
    -- One row per variation/dimension with aggregations
    SELECT
      m.variation,
      m.dimension,
      1.0 * COUNT(*) as count,
      AVG(1.0 * coalesce(m.value, 0)) as m_mean,
      VAR_SAMP(coalesce(m.value, 0)) as m_var,
      1.0 * sum(m.value) as m_sum
    FROM
      __userMetric m
    GROUP BY
      m.variation,
      m.dimension
  ),
  __overall as (
    SELECT
      u.variation,
      u.dimension,
      u.users as count,
      s.m_mean * s.count / u.users as mean,
      (
        CASE
          WHEN u.users > 1 THEN (s.count -1) * s.m_var / (u.users -1) + s.count * (u.users - s.count) * power(s.m_mean, 2) / (u.users * (u.users -1))
          ELSE 0
        END
      ) as variance,
      u.users as users
    FROM
      __overallUsers u
      LEFT JOIN __stats s ON (
        s.variation = u.variation
        AND s.dimension = u.dimension
      )
  )
SELECT
  variation,
  dimension,
  count,
  mean,
  (
    CASE
      WHEN variance > 0 THEN sqrt(variance)
      ELSE 0
    END
  ) as stddev,
  users
FROM
  __overall

helpful-application-7107

11/30/2023, 3:58 PM

Yeah they should obviously be the same (users and count) from that query. Looks like there's some bug when we processed the user count that couldn't handle that large number or something, but almost certainly this has changed since the version you're on so it's a bit hard for me to say what you could do to get around this issue.

helpful-application-7107

11/30/2023, 3:58 PM

What version are you on?

victorious-magazine-23782

11/30/2023, 3:59 PM

Not sure, can I see it from the UI settings?

victorious-magazine-23782

11/30/2023, 3:59 PM

Screenshot 2023-11-30 at 15.59.23.png

victorious-magazine-23782

11/30/2023, 3:59 PM

Just noticed this

helpful-application-7107

11/30/2023, 4:04 PM

I think i see what was happening. I think it would also be an issue with our current approach, We use

parseInt

to process the query result, and I think it might be struggling with that number.

helpful-application-7107

11/30/2023, 4:04 PM

But we used to use

parseFloat

for count, so that's why they'd differ once the data gets from SQL into our backend

victorious-magazine-23782

11/30/2023, 4:05 PM

Yeah, sounds quite plausible

victorious-magazine-23782

11/30/2023, 4:07 PM

Can you please point me to the code?

helpful-application-7107

11/30/2023, 4:07 PM

From the branch you're on: https://github.com/growthbook/growthbook/blob/214da6b8ad8add5643f505ed609f3fa5b6fa637f/packages/back-end/src/integrations/SqlIntegration.ts#L4[…]457

helpful-application-7107

11/30/2023, 4:08 PM

If the SQL returns the actual scientific notation, parseInt just throws away everything after the decimal.

victorious-magazine-23782

11/30/2023, 4:08 PM

Yes that's most likely it, good catch

helpful-application-7107

11/30/2023, 4:09 PM

So, I want to figure out why this is happening because we still use

parseInt

and for your case, you can either upgrade and tell me if it still breaks (the lazy way for me to test this 😄 ) or you can just edit that one function to use parseFloat.

victorious-magazine-23782

11/30/2023, 4:10 PM

Yes, we are in the process of upgrading, but it doesn't depend solely on me, but I will definitely let you know as soon as I have the chance to do so

victorious-magazine-23782

11/30/2023, 4:11 PM

Thanks a lot for your help!

helpful-application-7107

11/30/2023, 4:31 PM

Issue opened here since I think it will still persist: https://github.com/growthbook/growthbook/issues/1901

🙌 1

75 Views

Open in Slack

Previous Next