We have an interesting situation which we're not q...
# give-feedback
b
We have an interesting situation which we're not quite sure how to address The data we're working with is a combination of our ETL'd Mixpanel events, and internal backend transactional databases A lot of the data we work with is "special category" - health/sexuality/similar - so in order to protect it without locking down our data warehouse, we have a pseudonymization process. Most data (Mixpanel events and some backend data) is held against a "pseudonym ID" rather than the regular user ID. Some data (e.g. Stripe records, basic user registration info, etc) does just use the normal user ID though. In order to make this processable in tandem with the pseudonymized data in our warehouse, we make a copy all the user ID-oriented records so they can be used with the pseudonym ID as well, hashing all the IDs etc, removing any identifying information. Critically, to prevent reidentification via ordering/matching of high-entropy fields, we truncate all timestamps in this process, usually to the day
1
The upshot is that our
experiment_triggered
event often appears to have a later timestamp than the subsequent actions which we'd be measuring in the experiment, resulting in the data being excluded and most actions appearing to have occurred zero times
There are two potential options which come to mind: 1. set a -24h delay on the conversion window - don't know if this is intended, but it does work, though it would also include some events from the previous day 2. truncate the timestamp of all
experiment_triggered
events too - the main issue I could foresee is that we may have evaluated multiple variations for the user on the same day, due to our React app loading attribute data via several API calls - it applies the attributes as they become available Maybe there's others we haven't considered too?
On (2), if multiple variations have been recorded for the same user, it looks like GrowthBook will either take the first occurrence, or treat each occurrence separately - depending on the attribution model set on the experiment
f
Both of those options would work. Negative conversion delays are meant for cases where the logged experiment date might be after the metric conversion. Most commonly used for things like sessions, but it would work for this too. Just to clarify the "multiple variations" part, if the user sees multiple different variations (e.g. A and then B), we remove them from the analysis completely. If they just see a single variation (e.g. B), then the attribution Model determines how we treat multiple exposure events.
b
ah great okay, so it's pretty safe to truncate
it seems like option 1 would be less likely to include incorrect/old data, as it's at least limited to being on the same day
before/after
Seems to work well! Thanks for the info 🙂