Skip to main content

Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015

Colin F. Camerer, Anna Dreber, Felix Holzmeister, Teck Ho, Juergen Huber, Magnus Johannesson, Michael KirchlerGideon Nave, Brian A. Nosek, Thomas Pfeiffer, Adam Altmejd, Nick Buttrick, Taizan Chan, Yiling Chen, Eskil Forsell, Anup Gampa, Emma Heikensten, Lily Hummer, Taisuke Imai, Siri Isaksson, Dylan Manfredi, Julia Rose, Eric-Jan Wagenmakers, and Hang Wu

Published in: Nature Human Behaviour

@article{Camerer_ea_2018_EvaluatingReplicabilitySocial,
  title = {Evaluating the Replicability of Social Science Experiments in Nature and Science between 2010 and 2015},
  author = {Camerer, Colin F. and Dreber, Anna and Holzmeister, Felix and Ho, Teck-Hua and Huber, Jürgen and Johannesson, Magnus and Kirchler, Michael and Nave, Gideon and Nosek, Brian A. and Pfeiffer, Thomas and Altmejd, Adam and Buttrick, Nick and Chan, Taizan and Chen, Yiling and Forsell, Eskil and Gampa, Anup and Heikensten, Emma and Hummer, Lily and Imai, Taisuke and Isaksson, Siri and Manfredi, Dylan and Rose, Julia and Wagenmakers, Eric-Jan and Wu, Hang},
  date = {2018-09},
  journaltitle = {Nature Human Behaviour},
  volume = {2},
  number = {9},
  pages = {637--644},
  issn = {2397-3374},
  doi = {10.1038/s41562-018-0399-z},
  langid = {english}
}

Abstract

Being able to replicate scientific findings is crucial for scientific progress. We replicate 21 systematically selected experimental studies in the social sciences published in Nature and Science between 2010 and 2015. The replications follow analysis plans reviewed by the original authors and pre-registered prior to the replications. The replications are high powered, with sample sizes on average about five times higher than in the original studies. We find a significant effect in the same direction as the original study for 13 (62%) studies, and the effect size of the replications is on average about 50% of the original effect size. Replicability varies between 12 (57%) and 14 (67%) studies for complementary replicability indicators. Consistent with these results, the estimated true-positive rate is 67% in a Bayesian analysis. The relative effect size of true positives is estimated to be 71%, suggesting that both false positives and inflated effect sizes of true positives contribute to imperfect reproducibility. Furthermore, we find that peer beliefs of replicability are strongly related to replicability, suggesting that the research community could predict which results would replicate and that failures to replicate were not the result of chance alone.

Media Coverage