Hi team :wave: Based on the figures I see here fo...
# ask-questions
v
Hi team 👋 Based on the figures I see here for this revenue metric I would expect a chance to beat control ~ 3.5% (Bayesian engine). However, my UI is showing a CTBC = 50%. Can anyone help me understand what is happening? Thanks
👀 1
Also trying a more direct approach gives me a different CTBC
s
Our CBTC falls back to 50/50 when there isn't enough data for whatever reason.. Since there are only 3 users this may be what's happening
v
Hi @swift-helmet-3648, thanks for your response. Actually I think that is a bug, as you can see the count is 35,188,840, and that should be the number of users as well. In fact I get users = count in the underlying SQL query, but for some reasons in the UI that shows as the leading 3 only (see images below)
If there is actually a bug resulting in users being set to 3, might that cause the 50% CTBC?
s
Ah interesting, it's possible. Let me get back to you
thankyou 1
h
Yes, the problem is with 3 users we are going to run into all kinds of statistical uncertainty that prevents us from providing a more meaningful answer. So there's 2 things going on here: 1. It looks like you have 3 users, and that seems wrong! You might want to figure out what's going on with
id
in your data source! 2. We still need to improve error messaging, my guess is our stats engine is throwing an error because you only have 6 users and returns some default output. (we're working on improving this: https://github.com/growthbook/growthbook/pull/1671) If you go into your jupyter notebook and set the users to 3 you'll see the very empty looking response to
compute_result()
Oh, I see
count
and
users
differ
v
Hey @helpful-application-7107 thanks for jumping in! The point is that I do not have 3 users, it seems like a bug of the UI—see my previous screenshots
h
The stats engine is going to take what is in the
users
column (I'm confident in this because you're seeing that default 50% response), so it's not a UI bug
v
Yes, if you see the results of the underlying SQL query when I run it in Athena, users = count = 3.5*10**7
h
Oh man! You look like you're on a very old version of growhbook
v
Yeah I know—blame my devops lol
*devops team
h
Ok, it's going to be very hard to debug what's going on.
Is this a ratio metric?
v
But why are you saying it's not a bug if the UI shows 3, whereas the query shows 3.5*10**7?
h
(does it have a
Denominator
)?
v
No it's a revenue metric
No denominator
As simple as that
h
Yeah, then it's probably a bug, but it's also almost a year old at this point
So it has probably been fixed in that time period. Our queries are totally different now (they don't return means and stddev but rather sums and sums of squares as of Q1 this year)
👍 1
Can you share the full query with me?
v
Yes, I have seen it from you code. In fact if you look up, I have tried to run the experiment in python locally using sum and sums of squares
Yes, I think so, let me just check one thing
Copy code
-- Gross TTV (revenue)
WITH
  __rawExperiment as (
    SELECT
      id,
      created_at as timestamp,
      experiment as experiment_id,
      variation_id as variation_id
    FROM
      experiments.assignments
  ),
  __experiment as ( -- Viewed Experiment
    SELECT
      e.id as id,
      cast(e.variation_id as varchar) as variation,
      e.timestamp as timestamp,
      e.timestamp as conversion_start,
      e.timestamp + INTERVAL '72' hour as conversion_end
    FROM
      __rawExperiment e
    WHERE
      e.experiment_id = 'my_experiment'
      AND e.timestamp >= from_iso8601_timestamp('2023-11-10T17:30:00.000Z')
  ),
  __metric as ( -- Metric (Gross TTV)
    SELECT
      id as id,
      m.value as value,
      m.timestamp as timestamp,
      m.timestamp as conversion_start,
      m.timestamp as conversion_end
    FROM
      (
        SELECT
          id,
          timestamp,
          itinerary_sell_rate_total.usd_amount AS value
        FROM
          experiments.master
      ) m
    WHERE
      m.timestamp >= from_iso8601_timestamp('2023-11-10T17:30:00.000Z')
  ),
  __distinctUsers as (
    -- One row per user/dimension
    SELECT
      e.id,
      cast('All' as varchar) as dimension,
      (
        CASE
          WHEN count(distinct e.variation) > 1 THEN '__multiple__'
          ELSE max(e.variation)
        END
      ) as variation,
      MIN(e.conversion_start) as conversion_start,
      MIN(e.conversion_end) as conversion_end
    FROM
      __experiment e
    GROUP BY
      e.id
  ),
  __userMetric as (
    -- Add in the aggregate metric value for each user
    SELECT
      d.variation,
      d.dimension,
      d.id,
      LEAST(3500, SUM(value)) as value
    FROM
      __distinctUsers d
      JOIN __metric m ON (m.id = d.id)
    WHERE
      m.timestamp >= d.conversion_start
      AND m.timestamp <= d.conversion_end
    GROUP BY
      variation,
      dimension,
      d.id
  ),
  __overallUsers as (
    -- Number of users in each variation
    SELECT
      variation,
      dimension,
      1.0 * COUNT(*) as users
    FROM
      __distinctUsers
    GROUP BY
      variation,
      dimension
  ),
  __stats as (
    -- One row per variation/dimension with aggregations
    SELECT
      m.variation,
      m.dimension,
      1.0 * COUNT(*) as count,
      AVG(1.0 * coalesce(m.value, 0)) as m_mean,
      VAR_SAMP(coalesce(m.value, 0)) as m_var,
      1.0 * sum(m.value) as m_sum
    FROM
      __userMetric m
    GROUP BY
      m.variation,
      m.dimension
  ),
  __overall as (
    SELECT
      u.variation,
      u.dimension,
      u.users as count,
      s.m_mean * s.count / u.users as mean,
      (
        CASE
          WHEN u.users > 1 THEN (s.count -1) * s.m_var / (u.users -1) + s.count * (u.users - s.count) * power(s.m_mean, 2) / (u.users * (u.users -1))
          ELSE 0
        END
      ) as variance,
      u.users as users
    FROM
      __overallUsers u
      LEFT JOIN __stats s ON (
        s.variation = u.variation
        AND s.dimension = u.dimension
      )
  )
SELECT
  variation,
  dimension,
  count,
  mean,
  (
    CASE
      WHEN variance > 0 THEN sqrt(variance)
      ELSE 0
    END
  ) as stddev,
  users
FROM
  __overall
h
Yeah they should obviously be the same (users and count) from that query. Looks like there's some bug when we processed the user count that couldn't handle that large number or something, but almost certainly this has changed since the version you're on so it's a bit hard for me to say what you could do to get around this issue.
What version are you on?
v
Not sure, can I see it from the UI settings?
Screenshot 2023-11-30 at 15.59.23.png
Just noticed this
h
I think i see what was happening. I think it would also be an issue with our current approach, We use
parseInt
to process the query result, and I think it might be struggling with that number.
But we used to use
parseFloat
for count, so that's why they'd differ once the data gets from SQL into our backend
v
Yeah, sounds quite plausible
Can you please point me to the code?
If the SQL returns the actual scientific notation, parseInt just throws away everything after the decimal.
v
Yes that's most likely it, good catch
h
So, I want to figure out why this is happening because we still use
parseInt
and for your case, you can either upgrade and tell me if it still breaks (the lazy way for me to test this 😄 ) or you can just edit that one function to use parseFloat.
v
Yes, we are in the process of upgrading, but it doesn't depend solely on me, but I will definitely let you know as soon as I have the chance to do so
Thanks a lot for your help!
h
Issue opened here since I think it will still persist: https://github.com/growthbook/growthbook/issues/1901
🙌 1