Hey there, am I correct in thinking that the built...
# announcements
w
Hey there, am I correct in thinking that the built in Date dimensional analysis includes only behavior that happened within a user's first conversion window on a given day? Any reason it can't include all of a user's behavior on a given day?
cc @ambitious-apartment-58735 @orange-magician-36994
h
I think if you switch to the "Multiple Exposures" attribution model then it should work somewhat like you suggest. However, we're currently considering an overhaul of the the date dimensional analysis (since it can be prone to carry over bias in the current format) and deprecating multiple exposures in favor of another approach. The new date dimensional analysis will only use the date of a user's first exposure date for the date. Using later dates as independent units can result in bias in those later analyses if one variation causes more users or users of a certain type to return to the experiment more frequently. The replacement for "Multiple Exposures" attribution model will just look at all conversions after any potential conversion delay (e.g. from exposure date + conversion delay until the end of the experiment). This is a much simpler query that should be more performant.
Eventually we want to then support more date analyses (e.g. a time series of total effects as of day X, rather than the effects for users bucketed on day X, which is what our new approach will be).
LMK what you think and what kind of analysis would be really helpful for you.
w
From our PoV, we do not need the conversion window. After a user is exposed to a test, we do not want to filter out any data. To do this, we could use first exposure with a large conversion window (at least as long as the test). But we also look at metrics over time for each our tests, and I believe the current daily analysis captures all conversions within the window that starts on day X. Thus, a long conversion window wouldn't work for daily analysis. So for now, we use all exposures and make sure the conversion windows don't filter out any data. We look at the metrics over time to try to identify novelty effects as well as conduct data QA checks. We aren't doing any statistical testing, so I don't think introducing bias/dependent observations is a concern for us (but please correct me if I'm wrong). It would be very helpful to be able to track metrics per cell partitioned by day; meaning all the user behavior during day X is included for each day in the test.
h
From our PoV, we do not need the conversion window. After a user is exposed to a test, we do not want to filter out any data.
Good news: our refactor will handle this use case out of the box and is about 80% the runtime on our test data. We are finalizing the naming, but it will be something like
Full History
or
First Exposure until End of Experiment
that we intend to use to replace
Multiple Exposures
. Bad news: we don't really have an existing way to do the analysis you want to do. You're correct about the way it works now; each day will have all users exposed on that day + the conversion window starting on that day (each day a user is bucketed on kind of treats that user as if they were a totally new user). After our refactor, each day will be just users bucketed for the first time on that day and their metrics will be from either their first conversion window, or from the start of that window until the end of the experiment depending on whether you choose
First Exposure
or
Full History
Therefore, the time series will show you how effects have changed over when a user first entered an experiment, but not the average difference between group A and group B (for all users in that group) on day X.
👍 1
However, supporting that last use case is definitely on our roadmap as that is probably the most common time series of interest (besides how many users enter the experiment on each day).
👍 1