A few years ago, I became aware of serious problem in science: the irreproducibility crisis. A group of researchers at Amgen, an American pharmaceutical company, attempted to replicate 53 landmark cancer discoveries in close collaboration with the authors. Many of these papers were published in high-impact journals and came from prestigious academic institutions. To the surprise of everyone involved, they were able to replicate only six of those papers—approximately 11 percent.
As expected, this observation had wide reverberations throughout the scientific community. The inability to independently replicate scientific findings threatens to undermine trust in the institution of science.
Yet, as an experimental biologist, my initial reaction to this crisis was dismissive. I reaffirmed to myself that science is self-correcting, and that wrong ideas have a place within scientific discourse. After all, this is the very characteristic that distinguishes science from other human endeavors and gives it its nobility.
But as it turns out, irreproducibility in itself was not the problem—rather, it was its extent, which is becoming more apparent due to the exponential rise in scientific output (over 1.1 million scientific papers were indexed in PubMed in 2015). Widespread irreproducibility is often misconceived as intentional fraud—which does occur, and is documented by websites like Retraction Watch. But the majority of irreproducible research stems from a complex matrix of statistical, technical, and psychological biases that are rampant within the scientific community.
The institutionalization of science in the early decades of the 20th century created a scientific sub-culture, with its own reward systems, behaviors, and social norms. The rest of society sees this sphere a bit differently: Scientists are portrayed as selfless individuals who are solely motivated by curiosity and a hunger for knowledge. However, the existence of the irreproducibility crisis implies that other motives may also exist.
The first question to ask, in addressing the problem of irreproducibility, is: Why do scientists do science? This question itself is the subject of an entire academic discipline. Sociologists of science have consistently identified “public recognition” as scientists’ primary motivating factor. Of course, other drivers do exist, such as puzzle solving, knowledge building, and financial gain. But recognition seems to represent the common, essential driver.
Scientists’ behavior on an individual level is consistent with this view. We are obsessed with discovering things first, affiliating with prestigious institutions, publishing in recognized journals, getting cited by the masses, winning awards, and standing on stages. Scientists, like the rest of humanity, crave attention and respect by their peers and role models. The inability of scientists to admit this fact is understandable: The implication that their motives are self-serving can diminish the nobility of their work.
The well-recognized sociologist Robert Merton has pointed out that scientists’ need for recognition may stem from their need to be assured that what they know is worth knowing, and that they are capable of original thought. In this view, recognition is necessary for intellectual confidence.
The nature of scientific motivation is also evident in scientific reward systems. These rewards often come in some form of validation, such as awards, titles, and press coverage, which are then translated into career advancement and opportunities for greater prestige. Guidelines for promotion in several academic centers where I have worked have listed “Broader reputation than local area” as one of two promotion criteria for associate professors. In other words, the promotion of an assistant professor to an associate professor requires them to be famous within their field.
The inconvenient truth is that scientists can achieve fame and advance their careers through accomplishments that do not prioritize the quality of their work.
Currently, publishing in prestigious journals and being extensively cited represent the height of recognition in the scientific community. These two metrics imply quality, but have long been proven to be hollow. Papers in high impact journals, for example, suffer from irreproducibility at almost the same rate as those in lower impact journals. And those high-profile papers that are retracted are cited considerably both before and after their retraction.
The inconvenient truth is that scientists can achieve fame and advance their careers through accomplishments that do not prioritize the quality of their work. If recognition is not based on quality, then scientists will not modify their behaviors to select for it. In the culture of modern science, it is better to be wrong than to be second.
This does not mean that quality is completely neglected. The Nobel Prize—the most coveted form of recognition—is associated with scientific discoveries of the highest caliber. But for the tens of thousands of scientists fighting over shrinking research budgets, winning less visible awards becomes an obsession, needed for promotions and grants.
Today, the majority of the assessment metrics for quality in modern science is based on citations, such as impact factor and h-index. Conceptually, citations represent a good approximation of quality; however, they are greatly influenced by the sociological dynamics of the scientific community and can, thus, be gamed. For example, peer reviewers can ask authors to cite their papers as an implied condition for favorable critique. Also, journal editors encourage citation of relevant papers published in the same journal to drive up its impact factor. Interestingly, savvy scientists often add citations to their papers preemptively to appease potential reviewers and editors.
Objective quality should be based on the concept of independent replication: A finding would not be accepted as true unless it is independently verified.
The gaming of these metrics should not be viewed as merely a consequence of a flawed publishing model, but as a reflection of academic motives. So introducing new publishing platforms, or changes in the peer-review process—such as the innovations pioneered by F1000 and PLOS ONE—although very important and timely, may not lead to broad changes in behavior and thus may not improve reproducibility. That will only happen when outcomes become more closely aligned with the most coveted reward: recognition.
To make the desire for recognition compatible with prioritizing good science, we need quality metrics that are independent of sociological norms. Above all, objective quality should be based on the concept of independent replication: A finding would not be accepted as true unless it is independently verified.
Distinguishing between replicated and un-replicated studies would change how science is reported and discussed, increase the visibility of both strong and weak papers, incentivize scientists to only publish findings they have confidence in, and discourage publishing for the sake of publishing. Institutions would want to hire faculty with stellar qualitative records to build trust with industrial and governmental funders. Funding agencies would be inclined to support grants whose hypotheses are built on strong premises, and are submitted by investigators and institutions both known for quality. The public would become more skeptical of un-replicated science, preventing the wide adoption of false scientific ideas.
Of course, the transition to an institutionalized process to assess replication-based quality will require structural changes. First, scientists would need to be incentivized to perform replication studies, through recognition and career advancement. Second, a database of replication studies would need to be curated by the scientific community. Third, mathematical derivations of replication-based metrics would need to be developed and tested. Fourth, the new metrics would need to be integrated into the scientific process without disrupting its flow.
But these changes are all feasible and desirable. It is our responsibility as scientists to create transparency on how academic science is incentivized, produced, and evaluated. As Brian Nosek and colleagues from the Center of Open Science once said, “Openness is not needed because we [scientists] are untrustworthy; it is needed because we are human.”
Ahmed Alkhateeb is a postdoctoral research fellow at Harvard Medical School and Massachusetts General Hospital. His research focuses on stromal-tumor interactions in pancreatic cancer.
The lead photograph is courtesy of Tony Buser via Flickr.