Gelman proposed the time reversal heuristic when evaluating discussions about failed replications.
One helpful (I think) way to think about this episode is to turn things around. Suppose the Ranehill et al. experiment, with its null finding, had come first. A large study finding no effect. And then Cuddy et al. had run a replication under slightly different conditions with a much smaller sample size and found statistically significance under non-preregistered conditions. Would we be inclined to believe it? I don’t think so. At the very least, we’d have to conclude that any power-pose effect is fragile.
From this point of view, what Cuddy et al.’s research has going for it is that (a) they found statistical significance, (b) their paper was published in a peer-reviewed journal, and (c) their paper came before, rather than after, the Ranehill et al. paper. I don’t find these pieces of evidence very persuasive. (a) Statistical significance doesn’t mean much in the absence of preregistration or something like it, (b) lots of mistakes get published in peer-reviewed journals, to the extent that the phrase “Psychological Science” has become a bit of a punch line, and (c) I don’t see why we should take Cuddy et al. as the starting point in our discussion, just because it was published first.
I came across this:
Students should be alerted to a common fallacy in evaluating evidence. It is what I term the temporal order fallacy— that is, the failure of a later study to replicate the findings of an earlier study. The fallacy consists of according more weight to the second (more recent) study than to the first. This is terribly common in psychology. We often read that Dr. A’s study found such and such and then Dr. B’s study failed to replicate Dr. A’s finding. Dr. A’s finding is dismissed, and often that ends the matter. We can just as logically claim that Dr. A’s study failed to replicate Dr. B’s finding. The temporal order of the studies is irrelevant, other things being equal. If one study is superior in terms of design, statistical power, representativeness of samples, and the like, then of course it should be accorded more weight, regardless of its temporal order in relation to a contradictory study.
[In Arthur Jensen’s chapter in Race. Social Class, and Individual Differences in I. Q. by Sandra Scarr (1981).]
Interesting how it is opposite to the supposed present day bias of giving extra weight to the first study. The recommendation is still the same: one should not assign any scientific value to the order studies came out when evaluating the evidence base for a claim, all else equal. This is in fact not done using standard meta-analytic tools, so the proper response to conflicting or ‘conflicting’ findings is to collect more and especially larger samples. If heterogeneity remains high AND one has many samples, then look for moderators. Most moderator analyses are woefully low precision, and won’t produce anything useful. Given the prominence of power posing, collecting more data seems worth doing — even if it does constitute Captain Obvious science.