What replicates? Non-replicating studies from Camerer et al (2018) with commentary

Being able to replicate scientific findings is crucial for scientific progress. We replicate 21 systematically selected experimental studies in the social sciences published in Nature and Science between 2010 and 2015. The replications follow analysis plans reviewed by the original authors and pre-registered prior to the replications. The replications are high powered, with sample sizes on average about five times higher than in the original studies. We find a significant effect in the same direction as the original study for 13 (62%) studies, and the effect size of the replications is on average about 50% of the original effect size. Replicability varies between 12 (57%) and 14 (67%) studies for complementary replicability indicators. Consistent with these results, the estimated true-positive rate is 67% in a Bayesian analysis. The relative effect size of true positives is estimated to be 71%, suggesting that both false positives and inflated effect sizes of true positives contribute to imperfect reproducibility. Furthermore, we find that peer beliefs of replicability are strongly related to replicability, suggesting that the research community could predict which results would replicate and that failures to replicate were not the result of chance alone.

The most important take-away is this:

Anonymous surveys and prediction markets strongly agree on which studies are likely to replicate, and they don’t seem to suffer from much elevation bias, i.e. researchers’ level of skepticism is about right compared to the results.

Taking the results at their face value, which studies didn’t replicate?

  • Ackerman, J. M., Nocera, C. C., & Bargh, J. A. (2010). Incidental haptic sensations influence social judgments and decisions. Science, 328(5986), 1712-1715.

Touch is both the first sense to develop and a critical means of information acquisition and environmental manipulation. Physical touch experiences may create an ontological scaffold for the development of intrapersonal and interpersonal conceptual and metaphorical knowledge, as well as a springboard for the application of this knowledge. In six experiments, holding heavy or light clipboards, solving rough or smooth puzzles, and touching hard or soft objects nonconsciously influenced impressions and decisions formed about unrelated people and situations. Among other effects, heavy objects made job candidates appear more important, rough objects made social interactions appear more difficult, and hard objects increased rigidity in negotiations. Basic tactile sensations are thus shown to influence higher social cognitive processing in dimension-specific and metaphor-specific ways.

So, some typical social psychology with supposed irrational sensory effects.

540 citations.

  • Lee, S. W., & Schwarz, N. (2010). Washing away postdecisional dissonance. Science, 328(5979), 709-709.

After choosing between two alternatives, people perceive the chosen alternative as more attractive and the rejected alternative as less attractive. This postdecisional dissonance effect was eliminated by cleaning one’s hands. Going beyond prior purification effects in the moral domain, physical cleansing seems to more generally remove past concerns, resulting in a metaphorical “clean slate” effect.

Cleaning one’s hands changes beliefs about decisions? Yeah, no.

187 citations.

  • Kidd, D. C., & Castano, E. (2013). Reading literary fiction improves theory of mind. Science, 342(6156), 377-380.

Understanding others’ mental states is a crucial skill that enables the complex social relationships that characterize human societies. Yet little research has investigated what fosters this skill, which is known as Theory of Mind (ToM), in adults. We present five experiments showing that reading literary fiction led to better performance on tests of affective ToM (experiments 1 to 5) and cognitive ToM (experiments 4 and 5) compared with reading nonfiction (experiments 1), popular fiction (experiments 2 to 5), or nothing at all (experiments 2 and 5). Specifically, these results show that reading literary fiction temporarily enhances ToM. More broadly, they suggest that ToM may be influenced by engagement with works of art.

Reading fiction enhances theory of mind? Sounds like a finding researchers with high scores on interest in people vs. things want to show. Has some plausibility but apparently is not true or too weak to be shown in an experiment like this.

770 citations.

  • Gervais, W. M., & Norenzayan, A. (2012). Analytic thinking promotes religious disbelief. Science, 336(6080), 493-496.

Scientific interest in the cognitive underpinnings of religious belief has grown in recent years. However, to date, little experimental research has focused on the cognitive processes that may promote religious disbelief. The present studies apply a dual-process model of cognitive processing to this problem, testing the hypothesis that analytic processing promotes religious disbelief. Individual differences in the tendency to analytically override initially flawed intuitions in reasoning were associated with increased religious disbelief. Four additional experiments provided evidence of causation, as subtle manipulations known to trigger analytic processing also encouraged religious disbelief. Combined, these studies indicate that analytic processing is one factor (presumably among several) that promotes religious disbelief. Although these findings do not speak directly to conversations about the inherent rationality, value, or truth of religious beliefs, they illuminate one cognitive factor that may influence such discussions.

Manipulating analytic thinking getting people to self-report less religious belief or temporarily decrease it? That doesn’t sound real to me. Motive is to show religious conservative people to be irrational, a favorite of atheist liberals in academia.

361 citations.

  • Shah, A. K., Mullainathan, S., & Shafir, E. (2012). Some consequences of having too little. Science, 338(6107), 682-685.

Poor individuals often engage in behaviors, such as excessive borrowing, that reinforce the conditions of poverty. Some explanations for these behaviors focus on personality traits of the poor. Others emphasize environmental factors such as housing or financial access. We instead consider how certain behaviors stem simply from having less. We suggest that scarcity changes how people allocate attention: It leads them to engage more deeply in some problems while neglecting others. Across several experiments, we show that scarcity leads to attentional shifts that can help to explain behaviors such as overborrowing. We discuss how this mechanism might also explain other puzzles of poverty.

In the ‘prove poverty is bad’ category. The replication experiment was on showing how a prime for scarcity/poverty would hamper cognitive performance, presumably trying to show that poverty itself causes lower tested IQs. Authors couldn’t replicate it either, in fact, their replication showed p < .05 for the opposite claim (though minor effect size). Kudos to authors for running self-replications with n = 1000ish for all their experiments!

576 citations.

  • Sparrow, B., Liu, J., & Wegner, D. M. (2011). Google effects on memory: Cognitive consequences of having information at our fingertips. science, 1207745.

The advent of the Internet, with sophisticated algorithmic search engines, has made accessing information as easy as lifting a finger. No longer do we have to make costly efforts to find the things we want. We can “Google” the old classmate, find articles online, or look up the actor who was on the tip of our tongue. The results of four studies suggest that when faced with difficult questions, people are primed to think about computers and that when people expect to have future access to information, they have lower rates of recall of the information itself and enhanced recall instead for where to access it. The Internet has become a primary form of external or transactive memory, where information is stored collectively outside ourselves.

Another priming study. Priming people to think about computers reduces recall? I don’t think priming in general is real, so that’s gotta be a no.

837 citations.

  • Ramirez, G., & Beilock, S. L. (2011). Writing about testing worries boosts exam performance in the classroom. science, 331(6014), 211-213.

Two laboratory and two randomized field experiments tested a psychological intervention designed to improve students’ scores on high-stakes exams and to increase our understanding of why pressure-filled exam situations undermine some students’ performance. We expected that sitting for an important exam leads to worries about the situation and its consequences that undermine test performance. We tested whether having students write down their thoughts about an upcoming test could improve test performance. The intervention, a brief expressive writing assignment that occurred immediately before taking an important test, significantly improved students’ exam scores, especially for students habitually anxious about test taking. Simply writing about one’s worries before a high-stakes exam can boost test scores.

So another paper attacking standardized testing. Sounds like this other study (Cohen et al 2006) where having students write down some random self-promoting stuff is supposed to reduce the black-white gap by 40%. Big n replication by Protzko and Aronson did not find anything. Test results are really not that sensitive to minor context differences.

299 citations.

  • Rand, D. G., Greene, J. D., & Nowak, M. A. (2012). Spontaneous giving and calculated greed. Nature, 489(7416), 427.

Cooperation is central to human social behaviour. However, choosing to cooperate requires individuals to incur a personal cost to benefit others. Here we explore the cognitive basis of cooperative decision-making in humans using a dual-process framework. We ask whether people are predisposed towards selfishness, behaving cooperatively only through active self-control; or whether they are intuitively cooperative, with reflection and prospective reasoning favouring ‘rational’ self-interest. To investigate this issue, we perform ten studies using economic games. We find that across a range of experimental designs, subjects who reach their decisions more quickly are more cooperative. Furthermore, forcing subjects to decide quickly increases contributions, whereas instructing them to reflect and forcing them to decide slowly decreases contributions. Finally, an induction that primes subjects to trust their intuitions increases contributions compared with an induction that promotes greater reflection. To explain these results, we propose that cooperation is intuitive because cooperative heuristics are developed in daily life where cooperation is typically advantageous. We then validate predictions generated by this proposed mechanism. Our results provide convergent evidence that intuition supports cooperation in social dilemmas, and that reflection can undermine these cooperative impulses.

Another ‘rich is bad’ study. Since that study had 10 substudies, it was a little unclear what they replicated. Their notes say that:

So apparently, one of these substudies had already failed replication. The specific replicated one seems to be (original paper text):

We recruited 343 subjects on AMT to participate in a one-shot PGG experiment. The first condition promotes intuition relative to reflection: before reading the PGG instructions, subjects were assigned to write a paragraph about a situation in which either their intuition had led them in the right direction, or careful reasoning had led them in the wrong direction. Conversely, the second condition promotes reflection: subjects were asked to write about either a situation in which intuition had led them in the wrong direction, or careful reasoning had led them in the right direction. Consistent with the seven experiments described above, we find that contributions are significantly higher when subjects are primed to promote intuition relative to reflection (Fig. 2c; rank sum, P 5 0.011)

So, yet another implausible priming study based on having subjects write some random stuff.

829 citations.


I can’t say I’m very surprised these studies did not replicate, but the survey and prediction market shows that neither are other researchers. A reasonable position now would be to lower the prior for these kinds of studies, which could be simply implemented by requiring a smaller p value for acceptance. It’s much harder to QRP the way to p < .005 than to p < .05 (per the big proposal).

See also similar take by Jonatan Pallesen.

Leave a Reply