Clinical vs. statistical prediction

I learned about this topic many years ago reading Scott Lilienfeld‘s 50 Myths of popular psychology. In it, they write:

In his blockbuster bestselling book Blink: The Power of Thinking Without Thinking, journalist Malcolm Gladwell (2005) argued that experts reach decisions by focusing on the most relevant information and making accurate snap judgments. They can recognize crucial details without being distracted by anything else, and combine this information using skilled intuition honed by years of training and experience. This model of expertise is what most people expect of mental health professionals. But is there a different way of making clinical decisions?

Over a half century ago, the brilliant clinical psychologist Paul Meehl (1954) provided an insightful analysis of clinical decision-making, outlining two approaches to this task. He referred to the traditional approach, which relies on judgment and intuition, as the clinical method. Meehl contrasted this approach with the mechanical method. When using the mechanical method, a formal algorithm (set of decision rules) such as a statistical equation or “actuarial table” is constructed to help make decisions in new cases. Insurance companies have used actuarial tables for decades to evaluate risk and set premiums. For example, they can use knowledge of someone’s age, sex, health-related behaviors, medical history, and the like to predict how many more years he or she will live. Although actuarial predictions of mortality aren’t perfectly accurate for everyone, they provide a decent basis for setting life insurance premiums. Meehl proposed that a mechanical approach would prove just as useful in clinical decision-making. Was he right?

Meehl (1954) reviewed the 20 studies available at the time to compare the accuracy of clinical and mechanical predictions when researchers supplied both the practitioner and the formula with the same information. To the shock of many readers, he found that mechanical predictions were at least as accurate as clinical predictions, sometimes more. Other reviewers have since updated this literature (Dawes, Faust, & Meehl, 1989; Grove et al., 2000), which now includes more than 130 studies that meet stringent criteria for a fair comparison between the two prediction methods. They’ve found that Meehl’s central conclusion remains unchanged and unchallenged: Mechanical predictions are equally or more accurate than clinical predictions. This verdict holds true not only for mental health experts making psychiatric diagnoses, forecasting psychotherapy outcome, or predicting suicide attempts, but also for experts predicting performance in college, graduate school, military training, the workplace, or horse races; detecting lies; predicting criminal behavior; and making medical diagnoses or predicting the length of hospitalization or death. At present, there’s no clear exception to the rule that mechanical methods allow experts to predict at least as accurately as the clinical method, usually more so.

There are now three meta-analysis/reviews of this evidence, one more since this book was published.

Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science, 243(4899), 1668-1674.

Professionals are frequently consulted to diagnose and predict human behavior; optimal treatment and planning often hinge on the consultant’s judgmental accuracy. The consultant may rely on one of two contrasting approaches to decision-making — the clinical and actuarial methods. Research comparing these two approaches shows the actuarial method to be superior. Factors underlying the greater accuracy of actuarial methods, sources of resistance to the scientific findings, and the benefits of increased reliance on actuarial approaches are discussed.

Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C. (2000). Clinical versus mechanical prediction: a meta-analysis. Psychological assessment, 12(1), 19.

The process of making judgments and decisions requires a method for combining data. To compare the accuracy of clinical and mechanical (formal, statistical) data-combination techniques, we performed a meta-analysis on studies of human health and behavior. On average, mechanical-prediction techniques were about 10% more accurate than clinical predictions. Depending on the specific analysis, mechanical prediction substantially outperformed clinical prediction in 33%–47% of studies examined. Although clinical predictions were often as accurate as mechanical predictions, in only a few studies (6%–16%) were they substantially more accurate. Superiority for mechanical-prediction techniques was consistent, regardless of the judgment task, type of judges, judges’ amounts of experience, or the types of data being combined. Clinical predictions performed relatively less well when predictors included clinical interview data. These data indicate that mechanical predictions of human behaviors are equal or superior to clinical prediction methods for a wide range of circumstances.

Ægisdóttir, S., White, M. J., Spengler, P. M., Maugherman, A. S., Anderson, L. A., Cook, R. S., … & Rush, J. D. (2006). The meta-analysis of clinical judgment project: Fifty-six years of accumulated research on clinical versus statistical prediction. The Counseling Psychologist, 34(3), 341-382.

Clinical predictions made by mental health practitioners are compared with those using statistical approaches. Sixty-seven studies were identified from a comprehensive search of 56 years of research; 92 effect sizes were derived from these studies. The overall effect of clinical versus statistical prediction showed a somewhat greater accuracy for statistical methods. The most stringent sample of studies, from which 48 effect sizes were extracted, indicated a 13% increase in accuracy using statistical versus clinical methods. Several variables influenced this overall effect. Clinical and statistical prediction accuracy varied by type of prediction, the setting in which predictor data were gathered, the type of statistical formula used, and the amount of information available to the clinicians and the formulas. Recommendations are provided about when and under what conditions counseling psychologists might use statistical formulas as well as when they can rely on clinical methods. Implications for clinical judgment research and training are discussed.

I looked thru the papers citing this last meta-analysis, but did not find any newer meta-analyses for the broad topic. There are, however, lots of newer studies, especially for predicting violent behavior, and there are newer meta-analyses restricted to this domain as well.

One more meta-analysis is worth mentioning due to the topic: Kuncel, N. R., Klieger, D. M., Connelly, B. S., & Ones, D. S. (2013). Mechanical versus clinical data combination in selection and admissions decisions: A meta-analysis. Journal of Applied Psychology, 98(6), 1060.

In employee selection and academic admission decisions, holistic (clinical) data combination methods continue to be relied upon and preferred by practitioners in our field. This meta-analysis examined and compared the relative predictive power of mechanical methods versus holistic methods in predicting multiple work (advancement, supervisory ratings of performance, and training performance) and academic (grade point average) criteria. There was consistent and substantial loss of validity when data were combined holistically-even by experts who are knowledgeable about the jobs and organizations in question-across multiple criteria in work and academic settings. In predicting job performance, the difference between the validity of mechanical and holistic data combination methods translated into an improvement in prediction of more than 50%. Implications for evidence-based practice are discussed.

Are more experienced clinicians better?

Narrative reviews generally came to a negative conclusion about this question, but there is a meta-analysis now.

Spengler, P. M., White, M. J., Ægisdóttir, S., Maugherman, A. S., Anderson, L. A., Cook, R. S., … & Rush, J. D. (2009). The meta-analysis of clinical judgment project effects of experience on judgment accuracy. The Counseling Psychologist, 37(3), 350-399.

Clinical and educational experience is one of the most commonly studied variables in clinical judgment research. Contrary to clinicians’ perceptions, clinical judgment researchers have generally concluded that accuracy does not improve with increased education, training, or clinical experience. In this meta-analysis, the authors synthesized results from 75 clinical judgment studies where the experience of 4,607 clinicians was assessed in relation to the accuracy of their judgments about mental health (e.g., diagnosis, prognosis, treatment) and psychological issues (e.g., vocational, personality). The authors found a small but reliable effect, d = .12, showing that experience, whether educational or clinical, is positively associated with judgment accuracy. This small effect was robust across several tested moderator models, indicating experienced counselors and clinicians acquire, in general, almost a 13% increase in their decision-making accuracy, regardless of other factors. Results are discussed in light of their implications for clinical judgment research and for counseling psychology training and practice.

So, the authors claim a small but robust effect of d=.12 (i.e. r≈.06). The use of cohen’s d here is somewhat odd, given that experience is a continuous variable. A very small effect indeed. What about publication bias?

Publication source. Several commentators have discussed the presence of a publication bias in favor of statistically significant results (Rosnow & Rosenthal, 1989). The nature of this bias is that independent of a study’s quality in design and execution, reviewers prefer studies with significant results to those with nonsignificant results. Because of competition for publication in major journals, such as those published by the American Psychological Association (APA), effects may be larger in them. We tested this assumption and found that studies published in non-APA psychology journals (di+ = 0.04) had much smaller effects than studies found in APA journals (di+ = 0.27), QB(3) = 9.75, p < .05. This finding raises the possibility that focusing on only one publication source may present a skewed picture of the relationship between experience and accuracy.

So, I don’t trust this d=.12 finding. Could easily be a publication bias effect.

What to do with this information?

In my opinion, we could save lots of money and improve society by letting simple statistics do more work for us. Yet we often don’t. Why not? Lilienfeld conveniently has a review paper about this as well.