by Rong Huang, MS, Sr. Biostatistician
When researchers get a significant result (p<0.05), how confidently can they conclude their findings are really true? Can they say that error is unlikely: “There is only a 5 in 100 chance this result is false, because p = 0.05.” No! This is not right. The type I error, commonly defined as p=0.05 for statistical testing, is not equivalent to false positive rate.
Are you confused? Let me show you an example. Suppose I am testing 100 potential medications to treat pediatric leukemia. The historic success rate for leukemia drug discovery is 10%. So I expect 10 of these drugs to actually work. I must perform experiments to find out those effective drugs. In these experiments, I will set the significance level as p=0.05. I assume the experiment have a statistical power of 0.8. So of the 10 effective drugs, I will correctly detect around 8 of them. Because my p value threshold is 0.05, I have a 5% chance of falsely concluding that an ineffective drug works. Since 90 of my tested drugs are ineffective, this means I’ll conclude that about 5 of them have significant effects.
I perform my experiments and conclude there are 13 “working” drugs: 8 good drugs and 5 false positives. The chance of any given “working” drug being truly effective is therefore 8 in 13 – just 62%! In statistical terms, my false positive rate – the fraction of statistically significant results that are really false positives – is 38%.
Because the base rate of effective cancer drugs is so low (only 10%), I have many opportunities for false positives. Take this to the extreme: If I had the bad fortune of getting a truckload of completely ineffective medicines, for a base rate of 0%, then I have no chance of getting a true significant result. Nevertheless, I’ll get a p < 0.05 result for 5% of the drugs in the truck. In this extreme case, the false positive rate is 100%.
Remember how p values are defined: The p value is the probability, under the assumption that there is no true effect or no true difference, of collecting data that shows a difference equal to or more extreme than what you actually observed. A p value is calculated under the assumption that the medication does not work. It tells me the probability of obtaining my data or data more extreme than it. It does not tell me the chance my medication is effective. A small p value is stronger evidence, but to calculate the probability that the medication is effective, you’d need to factor in the base rate of true positives.
So when someone cites a low p value to say their study is probably right, remember that the probability of error is actually almost certainly higher. In areas where most tested hypotheses are false, such as early drug trials (most early drugs don’t make it through trials), it’s likely that most statistically significant results with p < 0.05 are actually false positives.
Rong Huang, MS is the Senior Biostatistician for Research Administration with 20 years of professional experience. Rong provides support in statistical methods and power analysis for grant proposals and grant reviews. He is an expert in statistical modeling, data mining, machine learning, prediction modeling, and data management. He has worked for Children's Health℠ for 9 years. Prior to joining Children's Health, he worked at Baylor Health System, the UCLA School of Public Health and the UCLA School of Nursing. He received his MS degree in Applied Biostatistics and MS degree in Computer Science from the University of Southern California.