I just finished reading Nate Silver's excellent book
The Signal and the Noise. This book is all about statistics, and how often misguided we are when hear statistics get thrown out for everything from weather predictions, to earthquakes, even baseball. It is very well written and engaging, he has quite the sense of humor, and gives funny descriptions of the people he met to interview for the book (to the reader at least, I doubt some of these scientists appreciated how he depicted them!).
In the book he talks about his own personal journey. He began as an economist for a big accounting firm, but developed a baseball sabermetric program that he sold to another company and struck it big. From there he turned to political forecasting, and has done quite well, he correctly predicted the results of the presidential election in all 50 states! They guy is what I would call a "genius" so the book is a must read.
However, this blog is about science, and what can a scientist take away from this book? Well, I think there is quite a bit for us, especially since statistics are the basis of our profession (statistically significant anyone?).
The main thing I would like to discuss here is
this paper by John P Ioannidis. Nate talks about this in his book, and the title of it is "Why most published research findings are false". Woah, that's quite the mouthful for any scientist to swallow. How does he come to these conclusions? Well he uses a lot of statistics that, to be honest with you, mostly went over my head. I was able to understand most of the article when the numbers were absent and he used words, though! What I got out of the article was that most of our published research findings are suffering from some of the most basic problems that the scientific method is supposed to sort out.
The first, and obviously most occurring problem, is bias. Ioannidis uses math to show that as bias increases the probability of a research finding being actually true goes down. I feel like bias is definitely as issue in a lot of journal articles, and we see it all the time. I think many people will remember an article published not long ago that said that
eggs were just as deadly as cigarettes? Well, it turned out these guys had connections to the cholesterol drug market, obviously they are going to find that the high cholesterol eggs are very bad for you.
This is just one type of bias though. Another type would be purposely designing an experiment to get a result you want. This can be done in many ways, by how you ask the question, to the statistical methods you use to present your data. Ioannidis talks about many studies that will use statistics to develop results that have a p<.05 (the standard in most scientific literature) when they previously did not meet that criteria. This is obviously poor science, if you modify your data so it looks how you want.
Another way we introduce error into our scientific studies is by saying that just because a difference is small, but statistically significant, it is a meaningful finding. I struggled with this principle myself when I was writing the
How to build muscle post. There were many studies that showed certain associations, but they were so very small. However, the study said it had a statistically significant finding, so I assumed it was true and used it as a source to back my claim. Significant or not, are these small gains enough to say something is important enough to publish a paper about it? Maybe we just need to be more selective about what gets published these days.
The most funny problem that Ioannidis points out is the competition between groups that can lead to exaggerated findings. If one group doing similar research to yours publishes a finding, one that is "statistically significant, and you can make the opposite finding in a "statistically significant" way, you should publish your finding immediately! Why? Well, mainly its just to be a dick, and say you're wrong! This great for you and your career, and can help grab headlines for your field, but is very damaging to science as a whole, where consensus needs to be found on topics. If you have opposite finding you need to work together to find and modify your experiments so that you can reach a consensus answer that benefits science.
Nate talks about this consensus view at length, and recommends that "Bayesian" thinking is the solution to this problem. He throws out the situation where you are in a world where all your predictions are depicted on a sandwich board, and when you meet someone with a differing prediction, you must either make a bet with them about who is right, or come to a consensus. This type of bayesian thinking is the core of Nate's book, and is quite the insight for those of who might have differing views, especially us scientists who seem to be rarely trained in this bayesian thinking.
All in all, I think both Nate's book and Ioannidis' paper are very good reads, I would recommend both of them to anyone. I think the main thing we get out of both of these is that we need to be better at critically analyze the data. We need to be sure we are finding signals, and not just noise.