Part 1 of the ‘Reproducibility Crisis’ Series
In a May 2016 Nature Magazine survey, 1,576 scientific researchers were asked various questions about the so-called ‘Reproducibility Crisis’. Of the participants, 52% said that there was a significant crisis, while another 38% suggested that there is only a slight crisis. It seems that what once looked overblown and exaggerated is now a legitimate source of worry in the scientific community.
The Reproducibility Problem
Before going through a possible (but definitely not comprehensive) solution to the reproducibility problem, it is first important to define what exactly the reproducibility crisis is. With the frequency in which the crisis is brought up in major publications, the blogosphere and by the mass media, one would expect the definition to be quite simple. Unfortunately, it’s not. Arturo Casadevall, a microbiologist at the Johns Hopkins Bloomberg School of Public Health is quoted in the Nature report proposing that “at the current time there is no consensus on what reproducibility is or should be.” Is irreproducibility a result of poor statistical analyses and P-hacking? Is it caused by a lack of scientific rigor in research rooted in the publish or perish academic environment? Is it just bad science?
Psychology – The First Victim?
However you would like to frame the crisis, it has already possibly taken its first victim—the field of psychology. The results of a massive reproducibility effort in Psychology were released last April, and suggested that only 39% of the reviewed studies could be reproduced. For a field that already has to fend off claims that it is in some way unscientific (or merely a ‘soft science’), such results levied a tough blow on Psychology (Kahneman warns of a “train wreck looming“). If any given published study cannot be trusted, then the very foundation of a scientific discipline must be put under constant scrutiny (whether fair or not).
The now infamous ‘Amgen Study’ from 2012 tried to replicate fifty-three cutting edge experiments in the field of cancer biology, and shockingly found that only six were reproducible. With the NIH investing more than $6 billion annually in cancer research, one would correctly assume that there was a considerable amount of outrage from the scientists and non-scientists alike.
The authors of the ‘Amgen Study’ suggested a more rigorous version of preclinical research that implements similar blinding mechanisms to the later clinical trials. Additionally, they proposed that the standard in the field must change in a way that would require researchers to publish both positive and negative results.
Selective Reporting and Pressure to Publish
The scientists who participated in the Nature survey (a large plurality of whom were biologists), listed selective reporting as the most likely and most consistent contributor to irreproducibility (roughly 70% said that it always/often contributes, and roughly 25% said that it only sometimes contributes). The second most significant factor, according to the study, is the pressure to publish, which has often been connected to the issues of selective reporting. The survey also asked respondents about their successes in publishing the results of their reproduction attempts, and found that only 24% were able to publish their successful reproductions, and 13% were able to publish their unsuccessful reproductions. In the current academic climate where ‘publish or perish’ is seen as something more than a pithy phrase, the lack of incentives for reproduction and the lack of value attributed to reproduction in scientific journals appears to have held back any significant systemic change.
While this issue is massive and deserves much more than a singular blog post to tackle its severity (as well as some overblown responses to its severity), here is a paper recently published in Nature that shows a step in the right direction.
The Dark Reactions Project
Researchers from Haverford College and Purdue University created a database of synthetic conditions of metal-organic-framework (MOF) syntheses in what was called the Dark Reactions Project. The true novelty of the database is that it includes not only successful syntheses but also those that failed. Using machine learning, the researchers hoped to “provide an alternative to experimental trial-and-error.” Moreover, the searchable database allowed researchers to examine previously attempted syntheses of a particular (or similar) MOF and formulate a synthetic strategy based on the listed reaction conditions. The machine learning aspect of the project provided a predictive analysis of the success given the reaction that was shown to outperform human intuition (the prediction success rate was 89%).
Share Your Failure
The Dark Reactions Project underscores the importance of sharing failed experiments with the scientific community. Science is never simple, and the oversimplification of the scientific process (by neglecting bad data in publication) is leading to increasingly bad science. Though most research groups do not utilize advanced machine learning to predict the outcomes of the experiments, they do use a Bayesian notion of prior probability in light of the data available on the subject (a similar, but much more mechanized and complex, version of this process is found in machine learning systems). It would seem quite obvious to suggest that people will perform better when they are presented with more information that is also more accurate—with no clear incentives in place, however, it is understandable why a researcher would not feel the need to share the data from his or her failed experiments.
More Successful Outcomes
There is a stigma behind failure in the scientific community. Failure doesn’t bring in grant money. Failure doesn’t give you tenure. Nobody won a Nobel Prize for a failed experiment. Failure, nevertheless, is the first step toward success. By understanding how someone previously failed at tackling a problem, a researcher could appropriately improve on (or at least alter) aspects of the experimental conditions or the nature of the experiment itself in order to hopefully get a more successful outcome. To err may be human, but to pretend that you didn’t may be even more so.