Hi <@U01T6HCHD0A> <@U01TCPDB58C> The bug related t...
# announcements
h
Hi @fresh-football-47124 @future-teacher-7046 The bug related to ratio metrics fixed? https://growthbookusers.slack.com/archives/C01T6PKD9C3/p1667921148551759?thread_ts=1667916912.725549&amp;cid=C01T6PKD9C3
h
Hi @handsome-library-89124, I was investigating this today. I'm wondering whether it's possible your numerator or your denominator metric has any NULL values in it? Could you check?
h
Hi @helpful-application-7107 It was an old experiment and I ran it without denominator. I am just asking if the bug mentioned by @future-teacher-7046 is fixed then I will use ratio metric for my future usage.
@helpful-application-7107 values can't be null because of the condition in query, it can't even a zero number.
h
One potential source for this bug is null values and we're discussing solutions to that. I'll try and follow-up when that is resolved.
h
Thanks I will wait to hear back from you!
g
Hi team! Is there any update here? We’re having a similar issue with some ratio metrics (however for others ratio metrics it seems to be working fine). Thanks! cc: @kind-airplane-15946
👍 1
h
@green-jordan-83609 You're still seeing negative variance right now?
I would love to see some screen shots of your results and the intermediate results that are proving problematic.
For example, the screen shots that the user provided here would be helpful for your case and could help us debug the issue: https://growthbookusers.slack.com/archives/C01T6Q0TEG3/p1666343634581349
g
Sure! The metric with the issue shows up like this in the experiment dashboard. It’s really similar to the case you attached: a static 50% chance to beat control and no violin plot at all. The metric is has the following characteristic: • defined as a count • user aggregation is SUM(value) • denominator is another count metric.
If I retrieve the query that the experiment is using (under view queries option), I can do the following:
Copy code
...all previous steps... ,
__stats as (
    -- One row per variation/dimension with aggregations
    SELECT
      d.variation,
      d.dimension,
      SUM(IFF(m.value is null, 1, 0)) as num_nulls_numerator,
      SUM(IFF(d.value is null, 1, 0)) as num_nulls_denominator
    FROM
      __userDenominator d
      LEFT JOIN __userMetric m ON (d.user_id = m.user_id)
    GROUP BY
      d.variation,
      d.dimension
  )
  
  select *
  FROM __stats
To check if this final join gets any null (i.e. some users might be in the denominator and not in the numerator, therefore retrieving a null in
m.value
). That’s not the case, as this query retrieves 0 in both cases.
The funny thing is that we have another ratio metric, which uses the same denominator (i.e. the same count metric as denominator), and for this one the violin plot and the stats works well. I’ve checked and the final metrics calculated by the query: • variation • dimension • users • count • mean • stddev Are more or less in the same proportion in both cases. stddev is in both cases roughly 3 times the mean. The only difference I’m able to see might be that the not working metric might has a mean closer to 0 (around 0.07), but I’m not sure that would be the root of the issue.
Apart from that, I just wonder how this new ratio metrics works with the bayesian statistical framework. Are we still defining a gaussian posterior with
mean=query_mean
and std =
query_stddev / sqrt(num_users)
? Thanks a lot and let us know if we can help with anything else!
h
Can you share the intermediate values from the metric queries? If you go to
view queries
, can you share the results from the ratio metric?
Just copy the cell below the ratio metric query along with the ratio metric query itself, and I'll be able to take a closer look.
@green-jordan-83609 shared some more details in a private DM. The issue in this case is that we use log approximations in our gaussian tests in
gbstats
and the values in your case are small (mean) and uncertain (variance), and the log approximation can suffer when the mean values are close to zero. We have a check built in
gbstats
here that returns no meaningful information for the posterior distribution when this check fails, since the log approximation could be inexact.
If the stddev of the ratio metric will decrease as you collect more data, this could go away. If not, then we could help you do a bespoke analysis in a jupyter notebook. In any case, we definitely need better error messaging and handling when cases like this occur. I'll open an issue for that.
g
Understood, thanks a lot! Would changing the scale of the metric help or that would just make it worst? Anyways we need to capture more data for sure.
h
I'm not sure if changing the scale of the metric would help. It would change the metric itself I would imagine, but you could try! I think the main issue here is that those standard deviations you shared with me are really large relative to the mean. However, that standard deviation depends on both the numerator and the denominator since this is a ratio metric.