When I review papers, I look for a number of things and I attempt to analytically replicate all the analyses done. This page serves a reminder of what to check for myself and (perhaps) as an inspiration for others.

General checklist:

  • Are there alternative predictor/outcome variables that could be used?
    • If there are more than one predictor/outcome variable, consider factor analyzing or averaging them to produce a more reliable outcome variable.
  • Check for reliability of the variables. Correct effect sizes for measurement error if possible.
  • Look for common null hypothesis significance testing (NHST) errors:
    • A finding with p > alpha does not imply effect size was 0 nor does it imply population effect size is 0. [inverse p fallacy]
    • A finding with p < alpha may not be scientifically important. Minute effects can have p << alpha if sample size is large. [significance equivocation fallacy]
    • Ideally, don’t use NHST unless they are actually appropriate!
  • If authors insist on using NHST, provide confidence intervals for important effect sizes. If these cannot be calculated analytically, use bootstrapped versions. These are also useful for data that violate parametric assumptions.
  • Does the paper mention important numerical facts in the abstract? Sample type, sample size, effect size(s) are important to report.
  • Does the paper mention effect sizes? Effect sizes should always be mentioned because they are necessarily for cumulative science.
  • Are methods used with data that clearly violate the assumptions of the method? Examples:
    • Non-normally distributed or discrete variables with parametric methods.
    • Fixed-effects meta-analysis used on data with clearly varying population effect sizes.
  • Are non-continuous versions of variables used instead of continuous versions without good reason? Continuous measures are superior, use them if possible.
  • If many models were tested, are they all reported? (reporting bias) If there are too many to report, report them in an appendix or summarize the results. Try all the models if possible with LASSO and OLS regression.
  • If R2‘s or other fit measures for models are given much weight, consider calculating cross-validated values to avoid overfitting problems.
  • Check for signs of selective citation. Indicators: citing highly-cited, small, old studies instead of a meta-analysis (with checks for publication bias).
  • If the study gathered new data from humans, was the study pre-registrered? If so, review the registration to look for reporting bias.

Meta-analysis / multi-study papers

Meta-analyses / multi-study papers require some additional checks:

  • Check for publication/reporting bias: funnel plots, p-curves, TIVA, etc.
  • Calculate bias corrected values if possible e.g. using PET-PEESE.

Analytic replication

I believe in robust science so I analytically replicate all papers I review if possible. Checklist for this task:

  • Are files uploaded to an OSF repository or similar?
  • Is the analysis code available?
  • Is the data available?
  • Are the variables described in the datafiles?
  • Replicate all analyses if possible. Publish the replication R code.