Are there any plans of adding a possibility of app...
# announcements
s
Are there any plans of adding a possibility of applying corrections to control family-wise error-rate in case of testing multiple outcomes? I feel like I may have trouble trying to communicate to the management that the chance to beat control of 95% when there are 3 metrics being tested is not too reliable
f
Hi Yevhen, what kind of corrections would you like to add? We feel that bonferroni is quite conservative…
a
Suggesting this (for a frequentist framework), https://en.wikipedia.org/wiki/Holm%E2%80%93Bonferroni_method
Which is less conservative than bonferroni, and is the default for R’s p.adjust https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/p.adjust (and potentially allowing the user to pick from any of the methods describe there would be a good, but probably a longer-term thing)
For Bayesian, the multiple comparison problem can potentially be addressed through the choice of prior, but that would require allowing the user to set the prior (https://statmodeling.stat.columbia.edu/2022/08/10/bayesian-inference-continues-to-completely-solve-the-multiple-comparisons-problem/)
f
@helpful-application-7107 ^
a
Also, am I right in thinking that in your frequentist engine the alpha is always .05 (user can’t change it?). If so, allowing the user to change that would be a very nice feature in general (and potentially easy to implement), that could also be a stopgap against multiple comparison issues.
h
Thanks Jane. Yeah I think in the long run we probably would allow users to move more towards controlling the FDR (using something like Benjamini-Hochberg), since controlling the FWER can become incredibly damaging for power as the number of tests grows.
Re: choices of prior and the alpha level. Letting users set these is definitely a reasonable ask, and the latter (0.05 in the Frequentist framework along with whether 95% is the threshold in the Bayesian framework) is reasonably high on our list of priorities but not under active development. Allowing users to set specific priors is a bit more complicated but would definitely unlock a totally hidden feature that we are already set up on the stats engine to handle.
f
The Bayesian threshold is adjustable as an environment variable
h
Ah, it is. Thanks.
My top priority for reducing false positives in the frequentist engine is landing sequential testing in order to reduce the peeking issue in that engine, before turning to cross metric/dimension corrections.
f
we’re still thinking of the best way to expose this to users to not encourage bad practices
a
“The Bayesian threshold is adjustable as an environment variable” Can you give a link/screenshot for this? I want to make sure I’m aware of all the customization that’s currently available.
f
sure, one sec
looks like it’s adjustable in the organization collection in mongo, specifically
settings.confidenceLevel
defaults to 0.95 if it’s not defined
h
In general, I'm also concerned about landing sequential testing for the Frequentist engine and nothing comparable for the bayesian engine, given that it can suffer from the same peeking issues. An immediate solution would be to allow people to have some simple, configurable shrinkage parameter that intuitively maps to prior sample size (or rather, sample size needed in the future to mostly wash out the prior) that would act as a strong bayesian corollary to the way we tune our sequential testing implementation in the frequentist framework. And in general, this would be less prone to abuse because the default now is no shrinkage!
a
What do you have in mind? The alpha and beta of the beta prior on the binomial can be interpreted as pseudo-observations (prior successes and failures), and the way your normal prior is set up in your white paper with three params (u_0, sigma^2_0, n_0) i think the n_0 can also be interpreted as pseudo-observations (or prior n). But it sounds like you’re looking for something other than just letting the user set those params?
@fresh-football-47124 re “we’re still thinking of the best way to expose this to users to not encourage bad practices”. You mean exposing the bayesian threshold (and maybe the frequentist alpha too?)? If so, what about allowing users to choose from standard values, (.95, .99 etc for bayesian. 05, .01, … for alpha)? That would be useful (at least to us) without potential for abuse (as far as I can tell).
f
yes, that’s a good idea
h
But it sounds like you’re looking for something other than just letting the user set those params?
I think there's a lot of users who can set those params directly, so exposing them makes sense. I'm just also considering a way to let people configure one parameter that maps to something like pseudo-observations (and intended length of running the experiment).