We have an interesting situation which we re not quite sure GrowthBook Users #give-feedback

We have an interesting situation which we're not q...

busy-horse-73824

10/31/2022, 7:23 PM

We have an interesting situation which we're not quite sure how to address The data we're working with is a combination of our ETL'd Mixpanel events, and internal backend transactional databases A lot of the data we work with is "special category" - health/sexuality/similar - so in order to protect it without locking down our data warehouse, we have a pseudonymization process. Most data (Mixpanel events and some backend data) is held against a "pseudonym ID" rather than the regular user ID. Some data (e.g. Stripe records, basic user registration info, etc) does just use the normal user ID though. In order to make this processable in tandem with the pseudonymized data in our warehouse, we make a copy all the user ID-oriented records so they can be used with the pseudonym ID as well, hashing all the IDs etc, removing any identifying information. Critically, to prevent reidentification via ordering/matching of high-entropy fields, we truncate all timestamps in this process, usually to the day

✅ 1

busy-horse-73824

10/31/2022, 7:23 PM

The upshot is that our

experiment_triggered

event often appears to have a later timestamp than the subsequent actions which we'd be measuring in the experiment, resulting in the data being excluded and most actions appearing to have occurred zero times

busy-horse-73824

10/31/2022, 7:24 PM

There are two potential options which come to mind: 1. set a -24h delay on the conversion window - don't know if this is intended, but it does work, though it would also include some events from the previous day 2. truncate the timestamp of all

experiment_triggered

events too - the main issue I could foresee is that we may have evaluated multiple variations for the user on the same day, due to our React app loading attribute data via several API calls - it applies the attributes as they become available Maybe there's others we haven't considered too?

busy-horse-73824

10/31/2022, 7:26 PM

On (2), if multiple variations have been recorded for the same user, it looks like GrowthBook will either take the first occurrence, or treat each occurrence separately - depending on the attribution model set on the experiment

future-teacher-7046

10/31/2022, 7:32 PM

Both of those options would work. Negative conversion delays are meant for cases where the logged experiment date might be after the metric conversion. Most commonly used for things like sessions, but it would work for this too. Just to clarify the "multiple variations" part, if the user sees multiple different variations (e.g. A and then B), we remove them from the analysis completely. If they just see a single variation (e.g. B), then the attribution Model determines how we treat multiple exposure events.

busy-horse-73824

10/31/2022, 7:51 PM

ah great okay, so it's pretty safe to truncate

busy-horse-73824

10/31/2022, 7:51 PM

it seems like option 1 would be less likely to include incorrect/old data, as it's at least limited to being on the same day

busy-horse-73824

10/31/2022, 7:52 PM

before/after

busy-horse-73824

10/31/2022, 7:55 PM

Seems to work well! Thanks for the info 🙂

80 Views

Open in Slack

Previous Next