have encountered a discrepancy in the results of a...
# ask-questions
a
have encountered a discrepancy in the results of an experiment where the same metric is measured in two different ways. When the metric is calculated using a Count approach, the result shows a decrease of 15%. However, when it is calculated using a binomial approach, there is an increase of 10%. In both cases, I apply a delimiter to evaluate the total mass. How can such a difference occur?
h
In both cases, I apply a delimiter to evaluate the total mass.
Denominators (if that is what you mean) are different for cont and binomial metrics. What are you choosing as your denominator?
You can read more about that here: https://docs.growthbook.io/app/metrics#denominator-ratio--funnel-metrics. The UI could be improved here, but they are treated separately when you apply a custom denominator. My follow-up question for you would be, what is your denominator that you're using?
a
its no difference , its only Count or Binominal thats the difference
between them
but show totally different result
as i wrote
h
In both cases, I apply a delimiter to evaluate the total mass
What do you mean by this?
Can you share the SQL for these two metrics as well as the denominator field? Screenshots would be very helpful.
a
Thanks, absolutely
its the same sql :SELECT c.id AS grade, c.created_at AS timestamp, MAX(c.employee_account_id) AS user_id, MAX(c.company_id) as company_id FROM demando-data.prod_public.connection c INNER JOIN demando-data.prod_public.account AS a ON a.id = c.employee_account_id LEFT JOIN demando-data.prod_public.candidate AS ca ON c.candidate_id = ca.id LEFT JOIN ( SELECT candidate_id, score AS latest_score, MIN(created_at) AS min_created_at FROM demando-data.prod_public.candidate_activity_score GROUP BY candidate_id, score ) AS cas ON cas.candidate_id = ca.id AND cas.min_created_at > c.created_at WHERE c.assistant_id IS NULL AND c.initiated_by = ‘employee’ AND c.is_open = TRUE GROUP BY c.id, c.created_at
And the denominator to this is company_accept_grade_sent from this sql : SELECT 1 AS grade, c.created_at AS timestamp, MAX(c.employee_account_id) AS user_id, MAX(c.company_id) as company_id FROM demando-data.prod_public.connection c INNER JOIN demando-data.prod_public.account AS a ON a.id = c.employee_account_id LEFT JOIN demando-data.prod_public.candidate AS ca ON c.candidate_id = ca.id LEFT JOIN ( SELECT candidate_id, score AS latest_score, MIN(created_at) AS min_created_at FROM demando-data.prod_public.candidate_activity_score GROUP BY candidate_id, score ) AS cas ON cas.candidate_id = ca.id AND cas.min_created_at > c.created_at WHERE c.assistant_id IS NULL AND c.initiated_by = ‘employee’ GROUP BY c.id, c.created_at
So the first sql i have as COUNT and Binominal in two different metrics pointng to same denominator. But showing very diff results..
h
For the Count metric, the SQL will have to be different because it will need a
value
column. What are you returning as the
value
column?
What's probably happening is for
binomial
metrics the numerator is 0 or 1 for users, but for
count
metrics we sum the
value
column. It seems possible from your above queries that you could end up with multiple rows per
user_id
which would result in different numerator values.
a
Ah sorry here is for the COUNT: SELECT 1 AS value, c.created_at AS timestamp, MAX(c.employee_account_id) AS user_id, MAX(c.company_id) as company_id FROM demando-data.prod_public.connection c INNER JOIN demando-data.prod_public.account AS a ON a.id = c.employee_account_id LEFT JOIN demando-data.prod_public.candidate AS ca ON c.candidate_id = ca.id LEFT JOIN ( SELECT candidate_id, score AS latest_score, MIN(created_at) AS min_created_at FROM demando-data.prod_public.candidate_activity_score GROUP BY candidate_id, score ) AS cas ON cas.candidate_id = ca.id AND cas.min_created_at > c.created_at WHERE c.assistant_id IS NULL AND c.initiated_by = ‘employee’ AND c.is_open = TRUE GROUP BY c.id, c.created_at
What different values should i put when its COUNT or Binominal? Dont know whats happening under the hood here after my sql..
Can you explain what i need to do andhow it works?
h
After your query we aggregate at the
user_id
level (or whatever identifier type you are using for your experiment). For Count we
SUM(value)
and for binomial we just look for any row for a user and count that as
1
. So ideally you shouldn't change your count definition to match the binomial and can just rely on the binomial metric. If you want to make them match, you could use the "Custom Aggregation" field of simply
1
rather than the default
SUM(value)
and then the results should be similar (the statistics won't be, because we model the two metrics differently, but they should be close, and the uplift percents should be the same).
a
So i change the count logic ? Shoould i use User Value Aggregation field and set SUM(value) on the 1 value?
And whats the difference between seting for example SUM in User Value Aggregation field or in the query
SELECT 1 AS value, c.created_at AS timestamp, MAX(c.employee_account_id) AS user_id, MAX(c.company_id) as company_id FROM demando-data.prod_public.connection c INNER JOIN demando-data.prod_public.account AS a ON a.id = c.employee_account_id LEFT JOIN demando-data.prod_public.candidate AS ca ON c.candidate_id = ca.id LEFT JOIN ( SELECT candidate_id, score AS latest_score, MIN(created_at) AS min_created_at FROM demando-data.prod_public.candidate_activity_score GROUP BY candidate_id, score ) AS cas ON cas.candidate_id = ca.id AND cas.min_created_at > c.created_at WHERE c.assistant_id IS NULL AND c.initiated_by = ‘employee’ AND c.is_open = TRUE GROUP BY c.id, c.created_at
@helpful-application-7107
h
We look at all rows for each user returned by your query and apply the user value aggregation. So what you want is to use
1
as the User Value Aggregation.
You can see how this gets used eventually if you click the three dots in the results table and click "View Queries"
a
So 1 in the query and SUM(value) in User Value Aggregation
Ill test it see if i get different result
First is the changed logic for COUNT and the other is for the binominal. Its still a very different result -30% vs +5%
We track the same result but in % and Number..
h
No, as I said above: So what you want is to use
1
as the User Value Aggregation.
The
value
column in the query can honestly be anything.