Sometimes it seems surprising that science functions at all. In 2005, medical science was shaken by a paper with the provocative title “Why most published research findings are false.”1 Written by John Ioannidis, a professor of medicine at Stanford University, it didn’t actually show that any particular result was wrong. Instead, it showed that the statistics of reported positive findings was not consistent with how often one should expect to find them. As Ioannidis concluded more recently, “many published research findings are false or exaggerated, and an estimated 85 percent of research resources are wasted.”2
It’s likely that some researchers are consciously cherry-picking data to get their work published. And some of the problems surely lie with journal publication policies. But the problems of false findings often begin with researchers unwittingly fooling themselves: they fall prey to cognitive biases, common modes of thinking that lure us toward wrong but convenient or attractive conclusions. “Seeing the reproducibility rates in psychology and other empirical science, we can safely say that something is not working out the way it should,” says Susann Fiedler, a behavioral economist at the Max Planck Institute for Research on Collective Goods in Bonn, Germany. “Cognitive biases might be one reason for that.”
Psychologist Brian Nosek of the University of Virginia says that the most common and problematic bias in science is “motivated reasoning”: We interpret observations to fit a particular idea. Psychologists have shown that “most of our reasoning is in fact rationalization,” he says. In other words, we have already made the decision about what to do or to think, and our “explanation” of our reasoning is really a justification for doing what we wanted to do—or to believe—anyway. Science is of course meant to be more objective and skeptical than everyday thought—but how much is it, really?
I was aware of biases in humans at large, but when I first “learned” that they also apply to scientists, I was somewhat amazed, even though it is so obvious.
Whereas the falsification model of the scientific method championed by philosopher Karl Popper posits that the scientist looks for ways to test and falsify her theories—to ask “How am I wrong?”—Nosek says that scientists usually ask instead “How am I right?” (or equally, to ask “How are you wrong?”). When facts come up that suggest we might, in fact, not be right after all, we are inclined to dismiss them as irrelevant, if not indeed mistaken. The now infamous “cold fusion” episode in the late 1980s, instigated by the electrochemists Martin Fleischmann and Stanley Pons, was full of such ad hoc brush-offs. For example, when it was pointed out to Fleischmann and Pons that their energy spectrum of the gamma rays from their claimed fusion reaction had its spike at the wrong energy, they simply moved it, muttering something ambiguous about calibration.
Statistics may seem to offer respite from bias through strength in numbers, but they are just as fraught. Chris Hartgerink of Tilburg University in the Netherlands works on the influence of “human factors” in the collection of statistics. He points out that researchers often attribute false certainty to contingent statistics. “Researchers, like people generally, are bad at thinking about probabilities,” he says. While some results are sure to be false negatives—that is, results that appear incorrectly to rule something out—Hartgerink says he has never read a paper that concludes as much about its findings. His recent research shows that as many as two in three psychology papers reporting non-significant results may be overlooking false negatives.3
Given that science has uncovered a dizzying variety of cognitive biases, the relative neglect of their consequences within science itself is peculiar. “I was aware of biases in humans at large,” says Hartgerink, “but when I first ‘learned’ that they also apply to scientists, I was somewhat amazed, even though it is so obvious.”
A common response to this situation is to argue that, even if individual scientists might fool themselves, others have no hesitation in critiquing their ideas or their results, and so it all comes out in the wash: Science as a communal activity is self-correcting. Sometimes this is true—but it doesn’t necessarily happen as quickly or smoothly as we might like to believe.
Nosek thinks that peer review might sometimes actively hinder clear and swift testing of scientific claims. He points out that, when in 2011 a team of physicists in Italy reported evidence of neutrinos that apparently moved faster than light (in violation of Einstein’s theory of special relativity), this astonishing claim was made,4 examined, and refuted5, 6 very quickly thanks to high-energy physicists’ efficient system of distributing preprints of papers through an open-access repository. If that testing had relied on the usual peer-reviewed channels, it could have taken years.
Similarly when researchers suggested in Science in 2010 that arsenic might substitute for phosphorus in the DNA of some microbe—a claim that would have rewritten the fundamental chemical principles of life—one of the researchers who conducted follow-up studies to try to replicate the findings felt it important to document her on-going results on an open-source blog. This was in contrast to the original research team, who were criticized for failing to report any subsequent evidence in support of their controversial claim.7
Peer review seems to be a more fallible instrument—especially in areas such as medicine and psychology—than is often appreciated, as the emerging “crisis of replicability” attests. Medical reporter Ivan Oransky and science editor Adam Marcus, who run the service Retraction Watch, put it this way: “When science works as designed, subsequent findings augment, alter or completely undermine earlier research … The problem is that in science—or, more accurately, scientific publishing—this process seldom works as directed … Much, if not most, of what gets published today in a scientific journal is only somewhat likely to hold up if another lab tries the experiment again, and, chances are, maybe not even that.”8
One of the reasons the science literature gets skewed is that journals are much more likely to publish positive than negative results: It’s easier to say something is true than to say it’s wrong. Journal referees might be inclined to reject negative results as too boring, and researchers currently get little credit or status, from funders or departments, from such findings. “If you do 20 experiments, one of them is likely to have a publishable result,” Oransky and Marcus write. “But only publishing that result doesn’t make your findings valid. In fact it’s quite the opposite.”9
“Like many graduate students, my idealism about how science works was shattered when I took research methods.”
Oransky believes that, while all of the incentives in science reinforce confirmation biases, the exigencies of publication are among the most problematic. “To get tenure, grants, and recognition, scientists need to publish frequently in major journals,” he says. “That encourages positive and ‘breakthrough’ findings, since the latter are what earn citations and impact factor. So it’s not terribly surprising that scientists fool themselves into seeing perfect groundbreaking results among their experimental findings.”
Nosek agrees, saying one of the strongest distorting influences is the reward systems that confer kudos, tenure, and funding. “To advance my career I need to get published as frequently as possible in the highest-profile publications as possible. That means I must produce articles that are more likely to get published.” These, he says, are ones that report positive results (“I have discovered …”, not “I have disproved …”), original results (never “We confirm previous findings that …”), and clean results (“We show that …”, not “It is not clear how to interpret these results”). But “most of what happens in the lab doesn’t look like that”, says Nosek—instead, it’s mush. “How do I get from mush to beautiful results?” he asks. “I could be patient, or get lucky—or I could take the easiest way, making often unconscious decisions about which data I select and how I analyze them, so that a clean story emerges. But in that case, I am sure to be biased in my reasoning.”
Not only can poor data and wrong ideas survive, but good ideas can be suppressed through motivated reasoning and career pressures. The suggestions by geneticist Barbara McClintock in the 1940s and ’50s that some DNA sequences can “jump” around chromosomes, and by biochemist Stanley Prusiner in the 1980s that proteins called prions can fold up into entirely the wrong shape and that the misfolding can be transmitted from one protein to another, went so much against prevailing orthodoxy that both researchers were derided mercilessly—until they were proved right and won Nobel prizes. Skepticism about bold claims is always warranted, but looking back we can see that sometimes it comes more from an inability to escape the biases of the prevailing picture than from genuine doubts about the quality of the evidence. The examples of McClintock and Prusiner illustrate that science does self-correct when the weight of the evidence demands it, says Nosek, but “we don’t know about the examples in which a similar insight was made but was dismissed outright and never pursued.”
Scientists have some awareness of this, to be sure. Many sympathize with philosopher Thomas Kuhn’s theory that science undergoes abrupt paradigm shifts in which the prevailing wisdom in the entire field is undermined and a wholly new picture emerges. Between such shifts, we see only “normal science” that fits the general consensus—until a build-up of anomalies creates enough pressure to burst through the walls into a new paradigm. The classic example was the emergence of quantum physics at the start of the 20th century; the 18th-century notion of phlogiston in chemistry—a supposed “principle of combustion,” overturned by Lavoisier’s oxygen theory—also fits the model. A famous quotation attributed to Max Planck suggests another means by which such preconceptions in science are surmounted: “Science advances one funeral at a time.” New ideas break through only when the old guard dies.
The role of bias in science became clear to Nosek as a graduate student in psychology. “Like many graduate students, my idealism about how science works was shattered when I took research methods”, he says. “In that class, we read lots of papers that were old even then—articles from the 1950s through the 1970s—articles about publication bias, low-powered research designs, lack of replication, underreporting of methodology in published articles, lack of access to original data, and bias against null results.”
Nosek has since devoted himself to making science work better.10 He is convinced that the process and progress of science would be smoothed by bringing these biases to light—which means making research more transparent in its methods, assumptions, and interpretations. “Fighting these issues isn’t easy, because they are cultural challenges—and no one person can change a culture,” he says. “So I started with the issue that I could control: the power of my research designs.”
Surprisingly, Nosek thinks that one of the most effective solutions to cognitive bias in science could come from the discipline that has weathered some of the heaviest criticism recently for its error-prone and self-deluding ways: pharmacology. It is precisely because these problems are so manifest in the pharmaceutical industry that this community is, in Nosek’s view, way ahead of the rest of science in dealing with them. For example, because of the known tendency of drug companies and their collaborators to report positive results of trials and to soft-pedal negative ones, it is now a legal requirement in the Unites States for all clinical trials to be entered in a registry before they begin. This obliges the researchers to report the results whatever they say.
Nosek has instituted a similar pre-registration scheme for research called the Open Science Framework (OSF). He had planned it for many years, but it really took off when former software developer Jeff Spies joined his lab in 2009-2010 and took it on as a dissertation project. “Lots of people got involved and it became a much bigger thing pretty quickly,” says Nosek. “We started a website for the OSF, and a community—and funders—gathered around it.” Nosek and Spies cofounded the Center for Open Science in Charlottesville in 2013, which now administers the OSF and is able to offer its services for free.
The idea, says Nosek, is that researchers “write down in advance what their study is for and what they think will happen.” Then when they do their experiments, they agree to be bound to analyzing the results strictly within the confines of that original plan. It sounds utterly elementary, like the kind of thing we teach children about how to do science. And indeed it is—but it is rarely what happens. Instead, as Fiedler testifies, the analysis gets made on the basis of all kinds of unstated and usually unconscious assumptions about what would or wouldn’t be seen. Nosek says that researchers who have used the OSF have often been amazed at how, by the time they come to look at their results, the project has diverged from the original aims they’d stated.
Hartgerink says that it is common to present unexpected results as expected. “Ask anyone in the general public whether it is OK to do that, and they will say it is not.”
Fiedler has used the service and says that not only does it keep the research honest but it makes it run more smoothly. “Pre-registration at the OSF forces me to think through all the details upfront, and the project, as well as some of the writing, is already done before I even start collecting the data,” she says. “Having this awareness helps me to separate which results I trust and which ones I trust less.” And not just her: Making the whole process transparent “gives every other researcher the chance to judge if this result is worth their valuable research time.”
Stating your aims is also a good way of checking that you know what they are, says Hartgerink, who is also an OSF user. “Once we decided to do this, we noticed that explicating the hypotheses was difficult in itself”—an indication that they hadn’t actually been formulated clearly enough. “Pre-registration is technically a must if you want to test hypotheses,” he concludes. Fiedler says that for the past year she and all of her Ph.D. students have used the OSF scheme. “I have learned so much by doing it that I can only recommend it to everyone in our line of work,” she avers.
The distinction between OSF and business as usual is considerable, says Hartgerink. Since most researchers write their manuscripts only after having conducted the study, hypotheses are not written down explicitly before. “This results in more favorable formulations of the hypothesis once results are known.” Psychologist Ernest O’Boyle of the University of Iowa and his coworkers have dubbed this bias to make the retrospective presentation of results more beautiful the “Chrysalis effect.” One consequence, Hartgerink says, is that it is common to present unexpected results as expected. “Ask anyone in the general public whether it is OK to do that, and they will say it is not. Yet this has been the common thing to do in science for a long time.”
Often, this shift in hypotheses and goals just happens, without intention and even without recognition. “Within the sometimes long process of designing an experiment, collecting the data, analyzing it, and presenting the results to our scientific colleagues, our way of looking at a question and the corresponding results evolves,” says Fiedler. “Along the way we might forget about the original tests that failed, and present our new insights as answering different questions based on the same data.” This approach to science has a lot of value, she says: It’s important to discover unforeseen connections. But not only does this shift the goalposts of the research, it can also lead researchers to “put too much trust in maybe spurious effects.” OSF forces researchers to leave their goalposts where they are.
But if you elect to constrain yourself to a narrow set of objectives before you’ve even done the experiments, don’t you close off potentially fertile avenues that you couldn’t have foreseen? Maybe, says Nosek, but “learning from the data” is not the way to reach reliable conclusions. “At present we mix up exploratory and confirmatory research,” he says. “One basic fact that is always getting forgotten is that you can’t generate hypotheses and test them with the same data.” If you find an interesting new lead, you should follow that up separately, not somehow tell yourself that this is what the experiment was about all along.
Fiedler disputes the accusation that pre-registration will kill creativity and freedom. “It’s not something everybody always has to do,” she says, and exploratory research that collects data without a definite agenda of hypothesis testing still has an important place. But we need to keep the distinctions in view.
The major obstacle, Hartgerink thinks, is education: Researchers are simply not advised to do things this way. But they had better be. “If younger researchers do not start applying these techniques now,” he says, “they might find themselves on the backbenches in 10 years, because it is becoming the norm to do your research in a reproducible, transparent, and open manner.”
Ultimately, Nosek has his eyes on a “scientific utopia,” in which science becomes a much more efficient means of knowledge accumulation. Nobody claims that OSF will be the panacea that gets us there, however. As Oransky says, “One of the larger issues is getting scientists to stop fooling themselves. This requires elimination of motivated reasoning and confirmation bias, and I haven’t seen any good solutions for that.” So along with OSF, Nosek believes the necessary restructuring includes open-access publication, and open and continuous peer review. We can’t get rid of our biases, perhaps, but we can soften their siren call. As Nosek and his colleague, psychologist Yoav Bar-Anan of Ben-Gurion University in Israel, have said, “The critical barriers to change are not technical or financial; they are social. Although scientists guard the status quo, they also have the power to change it.”
Philip Ball is the author of Invisible: The Dangerous Allure of the Unseen and many books on science and art.
References
1. Ioannidis, J.P.A. Why most published research findings are false. PLoS Medicine 2, e124 (2005).
2. Ioannidis, J.P.A. How to make more published research true. PLoS Medicine 11, e1001747 (2014).
3. Hartgerink, C.H.J., van Assen, M.A.L.M., & Wicherts, J. Too good to be false: Non-Significant results revisited. Open Science Framework https://osf.io. (Last update on April 7, 2015) Retrieved from https://osf.io/qpfnw/
4. Antonello, M., et al. Measurement of the neutrino velocity with the ICARUS detector at the CNGS beam. preprint arXiv:1203.3433 (2012).
5. Brumfiel, G. Neutrinos not faster than light. Nature News (2012). Retrieved from doi:10.1038/nature.2012.10249
6. Cho, A. Once Again, Physicists Debunk Faster-Than-Light Neutrinos news.sciencemag.org (2012).
7. Hayden, E.C. Open research casts doubt on arsenic life. Nature News (2011). Retrieved from doi:10.1038/news.2011.469
8. Oransky, I. Unlike a Rolling Stone: Is Science Really Better Than Journalism at Self-Correction? http://theconversation.com (2015).
9. Oransky, I. Unlike a Rolling Stone: Is Science Really Better Than Journalism at Self-Correction? www.iflscience.com (2015).
10. Ioannidis, J.P.A., Munafo, M.R., Fusar-Poli, P., Nosek, B.A., & David, S.P. Publication and other reporting biases in cognitive sciences: detection, prevalence, and prevention. Trends in Cognitive Sciences 18, 235-241 (2014).