Hi everyone I am setting up identifier join tables and I hav GrowthBook Users #announcements

Hi everyone, I am setting up "identifier join tabl...

blue-agent-15067

07/05/2023, 9:58 AM

Hi everyone, I am setting up "identifier join tables" and I have a doubt. • let's say that I am randomizing on

client_id

(cookie) • I have a

lead_id

that is generated once some step is done • the

lead_id

is the key used to identify conversions • so i set up a join table like

SELECT client_id, lead_id FROM table_relation

here are the doubts: • Is there a smart way that I am missing to handle repetition? example: 1. I am on device 1 (so I have

client_id_1

) and generate a

lead_id_x

. 2. then I log in on device 2 (so

client_id_2

) retrieving

lead_id_x

so same lead_id from point 1 3. then i do a binomial conversion with my

lead_id_x

4. when doing analysis, I would have for both

client_id_1

and

client_id_2

a conversion (because are both linked to

lead_id_x

• Is there a way to label these phenomena in analysis (ideally, I would also like to count distinct conversion)? • Do I have to create some attribution logic to handle it? e.g. only last

client_id_n

if multiple...

helpful-application-7107

07/05/2023, 10:48 PM

You're right that this particular set up is problematic. If you have multiple

client_id

mapping to one set of conversions, and you randomize on

client_id

, then you are potentially pooling together conversions because both

client_id_1

, who is potentially in your treatment, and

client_id_2

who is potentially in your control, are both getting the same conversion values.

• Is there a way to label these phenomena in analysis (ideally, I would also like to count distinct conversion)?

What do you mean by "label"? There's no way for any analysis system to attribute conversions to specific clients if you only give it conversion data with

lead_id

. You would need to analyze at the

client_id

level, which would involve tracking conversions by clients. If many users are using multiple clients, then you may have some issue with spillovers, but these can be very hard to track. However, unless you have many, many users with multiple clients, I would consider this strategy instead.

Do I have to create some attribution logic to handle it? e.g. only last
client_id_n
if multiple...

I'm not sure what exactly you are proposing here, but if you say more then maybe there could be some middle ground solution we could discuss.

blue-agent-15067

07/06/2023, 7:36 AM

Thank you @helpful-application-7107 I try to expand the concepts: regarding labeling I was thinking of a basic check (e.g. a window function to count the occurences of the

lead_id_x

across the dataset to have a taste of the spillover. ideally, if I have an high share of "count > 1" the spillover could be worrying. Regarding the attribution, is something I would not consider, but I like to know your thoughts about it: basically in the join table you create some logic to exclude all the multiple occurrences of

lead_id_x

(associated to multiple

client_id_n

) with some timestamp logic (e.g. I give 100% credit to the conversion to the last -or first - relationship

clientd_id_n <-> lead_id_x

based on the timestamp)

93 Views

Open in Slack

Previous Next