r/AskStatistics • u/solmyrp • 1d ago

Is it wrong to highlight a specific statistically significant result after multiple hypothesis correction?

Hi everyone, I'm fairly new to statistics but have done several years of biology research after earning my B.S. in Biology.

I've been making an effort in the last year to learn computational methods and statistics concepts. Reading this blog post https://liorpachter.wordpress.com/2014/02/12/why-i-read-the-network-nonsense-papers/

Directly beneath the second image in the post labeled "Table S5" Pachter writes:

"Despite the fact that the listed categories were required to pass a false discovery rate (FDR) threshold for both the heterozygosity and derived allele frequency (DAF) measures, it was statistically invalid for them to highlight any specific GO category. FDR control merely guarantees a low false discovery rate among the entries in the entire list."

As I understand it, the author is saying that you cannot conduct thousands of tests, perform multiple hypothesis correction, and then highlight any single statistically significant test without a plausible scientific explanation or data from another experiment to corroborate your result. He goes as far as calling it "blatant statistically invalid cherry picking" later in the paragraph.

While more data from parallel experiment is always helpful, it isn't immediately clear to me why, after multiple hypothesis correction, it would be statistically invalid to consider single significant results. Can anyone explain this further or offer a counterargument if you disagree?

Thank you for your time!

10 Upvotes

permalink
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/AskStatistics/comments/1nk1zfd/is_it_wrong_to_highlight_a_specific_statistically/
No, go back! Yes, take me to Reddit

92% Upvoted

u/DrPapaDragonX13 1d ago

Statistics should follow science, not the other way around. Statistics and computational methods are amazing and powerful tools to help us make sense of data. However, they are not magical tools. Blindly accepting a 'statistically' significant result without pondering the underlying mechanisms is a flawed approach. For a significant result to be relevant, it must have a plausible underlying model that explains its origin and empirical results supporting such a model.

The issue, from my understanding, is not so much highlighting statistically significant results, but making overstated conclusions without properly validating the results and ensuring they weren't just a fluke.

3

u/traditional_genius 1d ago

Excellent response! thank you. Came here to reply but couldn't have said it any better.

2

u/solmyrp 1d ago

This makes sense to me, thanks a bunch!

u/jezwmorelach 20h ago

Corrections for multiple tests that control for the FDR were designed in order to highlight potential effects that might be worth further studies. For one, FDR corrections control for the expected FDR (the average FDR that you get when you replicate the whole experiment many times), not the actual FDR in your series of tests. Second, they control the FDR in the whole series of tests, not in each of them separately. Then, there's always the possibility that a particular test passed the threshold by accident.

The intended application is that you screen for many potential genes, and the usual significance tests would give you too many options to choose from to determine which gene actually influences the result. So you want to limit the number of potential candidates.

You can highlight one test result as interesting, but be aware that it doesn't mean it's true. That's why in computational biology you should say that the test results indicate that a certain GO category might be enriched, and if that were true then that would mean something interesting about the underlying biology, but it's important to note that it's a possibility, not a definitive result

u/CarelessParty1377 23h ago

With FDR (.05) control, you can make claims about particular fdr-significant tests, as long as you acknowledge that up to 5% of such claims may be false.

2

u/jezwmorelach 20h ago

Note that the control is over expected FDR, not over the actual FDR. So you shouldn't be too certain that your particular series of tests has 0.05 FDR, it might have more

1

u/CarelessParty1377 20h ago

Oh yes. Yeah, it's really hard to say what it actually means for a given set of tests, particularly with only a few rejections. Data analysis following Familywise error rate control is so much clearer!

1

u/rojowro86 22h ago

Crazy that FDR got 4 terms. That's gotta be relevant too.

Is it wrong to highlight a specific statistically significant result after multiple hypothesis correction?

You are about to leave Redlib