Pseudo-A/A experiment setup for evaluating metric ...
# experimentation
l
Pseudo-A/A experiment setup for evaluating metric volatility and helping people see aspects of Type 1, 2, and magnitude errors. I thought others may benefit from this query as it is rather trivial to setup, but I have found it quite useful. We run A/A experiments with some regularity to validate end to end unbiasedness in our experiments. Beyond validation that things are working as intended, those can be really helpful in helping people see that random chance generates significant results on occasion and so becomes a good learning tool. In helping teammates understand the metrics they are using better, I generated a new experiment query in our primary data source that allows for an A/A experiment that only happens in the data warehouse (hence the "Pseudo" prefix as no experiment ever got deployed). The result is I can click "update" as often as I want and, each time I do so, growthbook fakes an "A/A" experiment. Users get randomly assigned, I can use existing dimensions for pseudo-analysis and show people how many primary/secondary/guardrail metrics show up as "significant" with each "experiment" run. Example query in thread
👍 1
🙌 1
common_table
you would just replace with a really common event table (e.g. page/screen view) that any active user would trigger. It is essentially your "activation" metric.
Copy code
with 

pseudo_experiment_assignment as (
    select
      common_table.user_id
      , 'faux_a_a_experiment' as experiment_id
      , case when avg(random()) < 0.5 then 'A' else 'B' end as variation_id
      , min(common_table.event_timestamp) as timestamp
    from common_table
    where common_table.event_timestamp > '{{ startDate }}'
    and common_table.event_timestamp < '{{ endDate }}'
    group by 1,2
    order by random()
    limit 200000
)

select *, experiment_id as experiment_name, variation_id as variation_name
from pseudo_experiment_assignment
You could add other where clauses if you wanted to evaluate some subset of your system -- or if you wanted to filter out users that are in experiments or any other condition relevant to your business. The limit is also helpful in that you can use a relevant value to ensure your sample size is appropriate for your typical experiment.
h
This is great, thanks for sharing. I've actually done exactly this when talking about a user about false positive rates!
b
Thanks for this! How can I make sure that pseudo A/A experiment doesn’t have data that is influenced by experiments currently running on the site?
l
You could write a CTE that selects from users who have an exposure to a non-control variant between the startDate and endDate and use that as a
and common_table.user_id not in (select user_id from users_in_experiments_cte)
We have existing systems that handle experiment assignment, so a query I would write would be specific to our company. If you are using growthbook to run the experiment @helpful-application-7107 is better positioned to suggest a good query.
I made the simple assumption that users in experiments will get randomly assigned to my pseudo A/A experiment and not matter, that may not hold true if you have some really influential outliers etc that impact metrics. At the same time this would be a way to expose that information so that you can see that every time you refresh the A/A experiment a metric is significant, but keeps switching from positive to negative randomly. Thus the random assignment of some highly influential user(s) is making that a really challenging metric to rely upon.
h
Yeah, I would simply not worry about it as those users in other experiments will be randomly distributed across the two groups since you're randomizing the experiment variations right in the SQL.
You could exclude them like Scott suggests, but I don't think it's necessary for this purpose.
b
Great! Thanks for the info