Hi everyone, I am setting up "identifier join tabl...
# announcements
b
Hi everyone, I am setting up "identifier join tables" and I have a doubt. • let's say that I am randomizing on
client_id
(cookie) • I have a
lead_id
that is generated once some step is done • the
lead_id
is the key used to identify conversions • so i set up a join table like
SELECT client_id, lead_id FROM table_relation
here are the doubts: • Is there a smart way that I am missing to handle repetition? example: 1. I am on device 1 (so I have
client_id_1
) and generate a
lead_id_x
. 2. then I log in on device 2 (so
client_id_2
) retrieving
lead_id_x
so same lead_id from point 1 3. then i do a binomial conversion with my
lead_id_x
4. when doing analysis, I would have for both
client_id_1
and
client_id_2
a conversion (because are both linked to
lead_id_x
• Is there a way to label these phenomena in analysis (ideally, I would also like to count distinct conversion)? • Do I have to create some attribution logic to handle it? e.g. only last
client_id_n
if multiple...
h
You're right that this particular set up is problematic. If you have multiple
client_id
mapping to one set of conversions, and you randomize on
client_id
, then you are potentially pooling together conversions because both
client_id_1
, who is potentially in your treatment, and
client_id_2
who is potentially in your control, are both getting the same conversion values.
• Is there a way to label these phenomena in analysis (ideally, I would also like to count distinct conversion)?
What do you mean by "label"? There's no way for any analysis system to attribute conversions to specific clients if you only give it conversion data with
lead_id
. You would need to analyze at the
client_id
level, which would involve tracking conversions by clients. If many users are using multiple clients, then you may have some issue with spillovers, but these can be very hard to track. However, unless you have many, many users with multiple clients, I would consider this strategy instead.
Do I have to create some attribution logic to handle it? e.g. only last
client_id_n
if multiple...
I'm not sure what exactly you are proposing here, but if you say more then maybe there could be some middle ground solution we could discuss.
b
Thank you @helpful-application-7107 I try to expand the concepts: regarding labeling I was thinking of a basic check (e.g. a window function to count the occurences of the
lead_id_x
across the dataset to have a taste of the spillover. ideally, if I have an high share of "count > 1" the spillover could be worrying. Regarding the attribution, is something I would not consider, but I like to know your thoughts about it: basically in the join table you create some logic to exclude all the multiple occurrences of
lead_id_x
(associated to multiple
client_id_n
) with some timestamp logic (e.g. I give 100% credit to the conversion to the last -or first - relationship
clientd_id_n <-> lead_id_x
based on the timestamp)