Getting personality right

This post covers some stuff already covered by others, but more briefly. Two studies of interest:

  1. Riemann, R., Angleitner, A., & Strelau, J. (1997). Genetic and Environmental Influences on Personality: A Study of Twins Reared Together Using the Self- and Peer Report NEO-FFI Scales. Journal of Personality, 65(3), 449–475.
  2. Connelly, B. S., & Ones, D. S. (2010). An other perspective on personality: Meta-analytic integration of observers’ accuracy and predictive validity. Psychological Bulletin, 136(6), 1092–1122.

Study 1 – the heritability of personality

Heritabilities for personality traits — usually OCEAN — are commonly given as around 50%. A typical citation for this is Bouchard’s 2004 review which produced this table:

bouchard review heritability

The values range from 42% to 57%, the mean of which is exactly 50%. (There’s a big meta-analysis from 2015 finding a mean of 40% but divergent results from adoption (22%) vs. MZT-DZT (47%) designs.) These results come from standard twin studies: MZT DZT comparisons. This design underestimates heritability when there is measurement error in the variables. Despite this, researchers routinely ignore measurement error and I have no idea why. As usual, Jensen got it right early on, such as in his 1969 review, by adjusting for measurement error in his review of the IQ findings, so why don’t they follow his example?

Self-rating measures of personality suffer from not just regular, random measurement error, but also have systematic measurement error (bias): people are not able to rate their own personality as well as other people who know them can. They introduce self-rating method variance into the data, and this variance is not so heritable. There is a twin study that used other-ratings of personality and when they used them or combined them with self-ratings, the heritabilities went up:

h pers 1 h pers 2 h pers 3

So with self-report they found H 42-56%, mean = 51%. Other-report: 57-81, mean = 66%, combined: 66-79, mean = 71%. (I used the AE models’ results when possible.) In fact, these analyses did not correct for regular measurement error either, so the heritabilities are higher still according to these data, likely into the 80%s area. This is the same territory as cognitive ability.

Main caveat: unreplicated study based on n = 964 cases. That sounds like a lot, but it is not for twin studies. Estimates of H rely on four measurements, so sampling error adds up quickly. (One has to estimate the intraclass correlations for MZs and DZs which are based on case pairs. Then one has to estimate the difference between these correlations.)

Jayman pointed me towards a replication of this finding in another and larger sample.

  • Riemann, R., & Kandler, C. (2010). Construct validation using multitrait-multimethod-twin data: The case of a general factor of personality. European Journal of Personality, 24(3), 258–277.

They fit a number of models to their data with higher higher order factors, big two and/or GFP. Unfortunately, they only report the behavioral genetics model parameters from the best fitting model which turned out to be a 5 + 2 model with cross loadings. The heritabilities from this were: E 86%, O 92%, ES (-N) 59%, A 85%, C 81%, Plasticity 50% and Stability 40%. If we use just the OCEAN traits as we did before, the mean heritability is 81%, with ES being the obvious outlier some 20%points below the others. Heritability of the big two were similar to the normal estimates for OCEAN for whatever reason. It’s not clear what the heritabilities of OCEAN traits would be if one used just the 5 factor model.

Study 2 – validity of self- vs. other-reported personality

If we accept the higher heritability of other-rated personality and that the cause of that is measurement error and bias, then we would also expect the (predictive) validity of other-rated personality to be stronger. At least, unless we think self-rating bias has as strong validity as the personality traits themselves. As it happens, there is a large meta-analysis on this topic concluding exactly that. They present their results in 3 large tables, but I’ve rearranged them in a smaller table for convenience:

Trait Rater Outcome: stranger impressions Outcome: academic achievement Outcome: job performance
Emotional stability Other 0.41 0.46 0.37
Emotional stability Self 0.20 0.25 0.12
Extraversion Other 0.46 0.52 0.18
Extraversion Self 0.37 0.09 0.12
Openness Other 0.58 0.29 0.45
Openness Self 0.42 0.09 0.05
Agreeableness Other 0.34 0.02 0.31
Agreeableness Self 0.26 0.06 0.13
Conscientiousness Other 0.42 0.69 0.55
Conscientiousness Self 0.27 0.22 0.23

[For academic achievement, I used the self-report value with the largest n. This had the effect of maximizing the correlations for self-report.]

These correlations are corrected for measurement error in both variables, so they should be quite comparable with regards to true correlations. The other-report correlations are systematically larger. It is easy to see if one plots them.


If we average validities within outcomes and calculate the over/self ratios, these are 2.8, 2.9 and 1.5, mean = 2.4. Other-report is much more valid.