Skip to Main Content

Reproducibility in Research

This guide provides resources, tips, and tools on making one's research reproducible

What is Reproducibility?

“the ability of a researcher to duplicate the results of a prior study using the same materials as were used by the original investigator. That is, a second researcher might use the same raw data to build the same analysis files and implement the same statistical analysis in an attempt to yield the same results…. Reproducibility is a minimum necessary condition for a finding to be believable and informative.”

- U.S. National Science Foundation (NSF) Subcommittee on Replicability in Science 

How is Reproducibility Different than ...

Replicability: the ability of a researcher to duplicate the results of a prior study if the same procedures are followed but new data are collected (NSF)

Rigor: the strict application of the scientific method to ensure unbiased and well-controlled experimental design, methodology, analysis, interpretation and reporting of results (NIH)

Generalizability: whether the results of a study apply in other contexts or populations that differ from the original one (NSF)


 Table showing that reproducibility has the same methods and experimental system, replicability has the same methods and different experimental system, robustness has different methods and the same experimental system, and generalizability has different methods and experimental system.

Schloss, P. D. (2018). Identifying and overcoming threats to reproducibility, replicability, robustness, and generalizability in microbiome research. MBio, 9(3).


Factors Influencing Reproducibility

Bishop's “Four Horsemen of Irreproducibility Apocalypse” - Main factors that may lead to irreproducible or false positive research: 

  1. Publication bias: Researchers are more likely to write about significant results and journal editors are more likely to accept manuscripts showing statistics significance or an expected result
  2. Low statistical power: Studies with low statistical power increase the likelihood that a statistically significant finding represents a false positive result.
  3. P-hacking: Running multiple studies but only reporting those that returned significant results.  Also known as data dredging or fishing expeditions.  
  4. HARKing (hypothesizing after results are known): When researchers state or change their hypothesis after results are analyzed. Also known as a post hoc hypothesis


(2018). Checklists work to improve science. Naturedoi: ; Bishop, D. (2019). Rein in the four horsemen of irreproducibilityNature568(7753), 435-436.; Dumas-Mallet, E., Button, K. S., Boraud, T., Gonon, F., & Munafò, M. R. (2017). Low statistical power in biomedical science: a review of three human research domains. Royal Society Open Science4(2), 160254. ; Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review2(3), 196-217.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.