Hi <@U01TCPDB58C>, <@U01T6HCHD0A> and <@U03GC3G9E8...
# give-feedback
m
Hi @future-teacher-7046, @fresh-football-47124 and @helpful-application-7107, we’re using the enterprise feature for multiple hypothesis corrections. Concretely, we decided to use to control the false discovery rate (FDR) with your implementation of the Benjamini-Hochberg method. We found that showing the adjusted p-values with the unadjusted confidence intervals is highly confusing to a user. Would you consider adjusting the confidence intervals, too? As I understand it, you can have simultaneously adjusted p-values and confidence intervals for the adjusted critical p-value per metric as well without making a math error. You just scale the individual confidence levels per hypothesis. FYI @adventurous-dream-15065
h
Hi @mysterious-iron-16289, unfortunately, there are not directly analogous CIs for the Benjamini-Hochberg procedure (as far as I know) given the way the procedure works. Our FWER control method (holm-bonferroni) also does not have analogous CIs. The simple Bonferroni correction does, but that test is considerably more conservative. However, I understand that seeing a non-statsig p-value next to a CI that is completely on one side of 0 can be confusing.. We could consider hiding CIs, an alternative procedure that maybe does have analogous CIs but is less widely adopted, or doing something ad-hoc to the CIs, but for these widely used methods with good statistical properties, there are not directly analogous CIs.
m
Really? I thought that you just would calculate the critical p-value instead of an adjusted p-value - the formula is the same. I attached the formula I was thinking about. Denoting the k-th hypothesis in an ordered list of m experiments, where _p_k_ is the p-value of the k-th experiment, alpha the critical value. The values with the tilde are adjusted. You basically provide \tilde{p_k} right now. And when you’re smaller than alpha, then you should have a CI with 0 lying outside of the interval with hypothesis confidence level 1 - \tilde{\alpha}/2. Or am I mistaken?
h
And when you’re smaller than alpha, then you should have a CI with 0 lying outside of the interval with hypothesis confidence level 1 - \tilde{\alpha}/2. Or am I mistaken?
This sounds correct to me. But the question is what is the general way you can construct that CI?
Also, the BH procedure is not as straightforward as that formula you provided.
Rather, computing the adjusted p-values is not that straightforward.
m
I know, BH has some more steps, so that you can have significance at the k-th level even when the inequality is not true, given the k+n-th hypothesis becomes significant according to that formula.
👍 1
Assuming a list of ascending ordered _p_k_.
h
Yeah, and these kinds of stepped procedures, to my understanding, don't have analogous CIs.
m
That’s true but the p-value has an ci equivalent. As I understand it, a confidence interval is related to the p-value in the sense that the difference between the measured difference and 0 is given by the attached formula. SEM is the standard error of the mean and Z the critical value at c.l. 1-p/2. Hence, at the critical value p=alpha the CI touches 0. When p<alpha its smaller, 0 is outside the CI, when p>alpha it’s inside. So for me it looks like you could use this to construct the CIs.
h
Hmmm, I'm not sure that would work, but I can think about it some more. Benjamini and Yekutieli did a lot of work on corrected CIs that, when reading it, seemed to me to further my assumption that there were no appropriate CIs for the BH procedure itself.
m
The above formula certainly works, if you have just one hypothesis. And regarding the interpretation, it should be viable if the hypotheses are uncorrelated. However, BH assumes zero or positive correlation, where the latter might affect the statement. It would be good to find out what the key point of Benjamini and Yekuieli were to investigate the point further. Maybe they even make a statement why this naive approach is not working. We can get back to it tomorrow, if you like. We would really like to have a correct visualisation, since they current can be misleading.
h
The above formula certainly works, if you have just one hypothesis.
But, this is the crux of the issue, no?
👍 1
We would really like to have a correct visualisation, since they current can be misleading.
Yeah, I definitely agree with this and am sympathetic to wanting a better solution than what we currently have.
👍 1
m
Let us know if you have an idea about this topic. If we come up with something, we can notify you, too.
Ahoy, I a read a bit into Benjamini & Yekutieli, especially this paper. They make a point that the equivalent of the FDR for p-values is the false coverage-statement rate (FCR) for CI. They suggest a procedure that basically constructs CI from adjusted critical p-vlaues as the optimal approach for being equivalent with controlling for the FDR (see definition 1 and the description before). They explicitly give Benjamini-Hochberg as example scenario 2. The rest of the paper shows why, but I didn’t go through all the proofs, tbh. What do you think about it?
h
Yeah, this is the paper I was referencing earlier. They talk about how this works for BH Selected CIs, and don't seem to discuss in general the case where we need to construct CIs for both rejected and non-rejected hypotheses. Maybe their implementation is enough not to matter and CIs constructed in this way for non-rejected hypotheses will be in line what what we'd expect.
m
I just realised that page 74 was absent in my copy of the paper 😱 Now it’s way easier to understand. I see your point. However, the correction suggested is connected to FDR, a type I error. Adjusting the FCR for not selected CIs would be a type II error correction. I would argue that from the point of controlling the FDR it is not relevant how wide exactly the CI for the non-selected parameters are. It’s just important, that they encompass 0. What do you think?
Btw, for al the ones who want to have a look at the paper, here’s a legal, public version from the author’s web page: http://www.math.tau.ac.il/~ybenja/MyPapers/benjamini_yekutieli_JASA2005.pdf
h
I would argue that from the point of controlling the FDR it is not relevant how wide exactly the CI for the non-selected parameters are. It’s just important, that they encompass 0.
I agree somewhat. From the perspective of building a reliable machine used by many different companies and analysts, I am a bit concerned about ad-hoc CIs for these non-selected tests. That said, this is probably better than the status quo: unadjusted CIs & a tooltip warning that they aren't adjusted.
👍 1
m
I agree, ad-hic adjusted with tooltip is way less misleading than the status quo. Regarding proper CI: we could, potentially, write a mail to the authors of the paper and ask for their opinion. I guess they have thought about the non-selected CIs at least a bit. Even if it is not necessarily an interesting topic for their research. Worst case we get no answer 😉
a
@helpful-application-7107 do you have an update on this issue? The Black Friday season is coming up and we are looking forward to share results with a wider audience than just experimentation pros. For that reason it would be great, if the visualizations in GrowthBook more closely resemble the numbers. Thank you!
h
No! I'm sorry 😕 I've tried to turn to it a couple of times but we've prioritized other work.
😒 1
Let me see if there's something quick we can implement.
🙌 1
My hangup here still is particularly what to do when the "p"-value == 1. We can easily just back out the stddev that would have produced an adjusted p-value below 1, but not one when our adjusted p-value is actually 1. In your use case, would you prefer to hide the CIs when p-value == 1, or do something ad-hoc like "find max adjusted p-value that is < 1 and use that for all p-values that are actually 1, so that the CIs for "p"-value == 1 are inflated by a similar factor to the largest p-value in the dataset below 1"? Or even allow you to set a max adjusted p-value and use that to construct CIs?
a
FYI, @mysterious-iron-16289
Regarding > hide the CIs when p-value == 1 Thinking of a “visuals only” user who does not understand the numbers at all, I believe this can be dangerous as no p-value can be interpreted as a super precise estimate. However, I like the solution if its easy to visualize that the CI extents beyond the graph like in the attached picture even without a value at the end or just a bar fading into the edge of the graph. > inflated by a similar factor to the largest p-value in the dataset below 1 sounds nice on first thought, but I am not sure if that is good for all edge cases with extreme differences in sd and p-value
h
Hi @adventurous-dream-15065, we implemented adjusted CIs as default for tests with corrected p-values. When the p-value is 1, we extend the CI range to be infinite, which follows the ad-hoc adjustment we are using. We also return the unadjusted CIs in the tooltip. Feedback is welcome.
❤️ 2
a
That sounds amazing, I will look at the result on Wednesday and will get back to you with feedback! FYI, @mysterious-iron-16289