Does assortative mating work?

In my recent post about differences on the tails, a good comment was posted by Kristine:

Is there any correlation between IQ gap and divorce, I wonder…

The generation observation among humans and other species is that mates are more similar than would be expected by chance, called social assortment, or assortative mating. It’s part of the broader phenomenon of social homophily, or like attracts like (‘birds of a feather flock together’). Numerous studies have shown that friends are more similar on various attributes than expected by chance, including intelligence, but also of course age, social background, interests, politics, and so on. This creates a kind of natural echo chamber which can be further enhanced with the internet since this enables people to seek out and associate with people even more similar to themselves than they could otherwise find in their face-to-face networks. At least, that’s the theory.

We can perhaps supply some evolutionary speculative theories about why social homophily exists in general. In terms of inclusive fitness theory, associating with and helping people with similar trait levels to yourself would in effect have a slight positive effect on your inclusive fitness, as the specific genetic variants that make you smart, tall etc. are also in general found in other smart, tall etc. people. This effect would be very slight though due to polygenicity (many genetic causes of small effect for most traits). The simpler overall theory of egoism is just that organisms like themselves in general, and thus also in effect like others who are similar to themselves. It’s easy to think of it this way for humans, but whether this model makes sense for rabbits or salamanders is another thing.

Whatever the foundation for this preference seen in general in the animal and human worlds alike, we can ask whether it actually works. Do more assortatively mated couples last longer together? The answer is surprisingly difficult to find for two reasons. First, it is hard to study empirically since it requires data for dyads (pairs), which is harder to recruit (maybe the wife wants to contribute to the study but the husband doesn’t). Moreover, it requires longitudinal data. You measure the couples at time X and then at some later point Y with some years in between. This requires tracking people over time. Second, since it is a kind of interaction effect statistically, it requires high power and precision of measurement. In effect, we are trying to model the chance of a break-up of some kind (formal divorce or separation, or just a more general relationship dissolution) as a function of the difference in the mate’s score on some trait. This is more tricky when there are differences between the sexes on the trait in question, say, neuroticism. If the average woman is 0.50 SD higher than the average man, then most couples will tend to have a fairly big gap absent some spectacular assortment. In general, thus, researchers may use within-sex norms for the traits instead. It’s not so clear which is best here before looking at some data. Finally, additive effects will tend to dominate any interaction effects. I’ve previously written about how people who are high in openness and neuroticism are more likely to divorce in general. The effect seems to be additive, so that couples with 2 persons with high neuroticism are more likely to divorce than couples with 1 high and 1 low. This goes against a protective effect of assortment, but could be consistent with a small effect. To see what I mean, here’s some simulated results:

Here we can see that the effects of neuroticism are additive (leftmost plot), and there is additionally a protective effect of assortment (rightmost plot). A more clear way to under how this works is to use color as a third dimension:

The rightmost plot shows the main effects only. In this case, the effect of neuroticism is just how much of it you have in a relationship, it doesn’t matter how it is distributed. Because of this, there are lines of equivalent risk going from the top left to bottom right, since e.g. +1 + -1 = 0, and 0 + 0 = 0. In the leftmost plot, we see the effect of the assortment giving rise to the lines horizontally and vertically. The lowest risk of divorce is found for couples that are perfectly matched (on the central line), at any given level of average neuroticism within the couple. These results are purely hypothetical to give us an understanding of how the statistics would work out. Probably the strength of the dissimilarity effect is not as large as I have assumed here, but I wanted it to be easy to discern on a plot.

Anyway, so are there some studies looking at this kind of thing? There’s a study of mental health diagnoses, that I covered before when writing about lesbian divorce rates:

Their dummy variable for “both” in model 2 is a test of this similarity model. If there was a protective effect of being similar in having poor mental health (‘suffering together’) then this estimate should be somewhat lower than 1. It was, but not beyond chance. Maybe the study lacked power, but it suggested that the effects were mostly additive and not interactive. The study is too small to rule out small interaction effects.

Economists have also been interested in this topic for a while, and produced many studies like this one:

Kraft, K., & Neimann, S. (2009). Impact of educational and religious homogamy on marital stability (No. 4491). IZA Discussion Papers.

Using a rich panel data set from the German Socio-Economic Panel, we test whether spouses who are similar to each other in certain respects have a lower probability of divorce than dissimilar spouses. We focus on the effect of homogamy with respect to education and church attendance. Gary Becker’s theory of marriage predicts that usually, positive assortative mating is optimal. Our results, however, suggest that homogamy per se does not increase marital stability but higher education and religiousness.

Their tables are not too easy to read, so I won’t repost them here. But basically they just examine the various crude combinations of level of education (low, medium, high), and church goingness (yes, no). They find that more education predicts less divorce and it doesn’t matter who is educated, and the same is true for going to church. These are just main effects. Their data is crude, so may have missed some minor effects of homogeneity itself. It’s somewhat difficult to spot these hypothetical interaction effects in the presence of large main effects.

With regards to the particular question raised by the commenter, similarity in intelligence itself, I wasn’t able to find any studies. We already know from much research that more intelligent people are less likely to divorce, but is the risk the same for a couple of 100 + 100 IQ vs. 80 + 120 IQ? Such large gaps (40 IQ) are very rare among couples given that the correlation is about 0.50 (adjusted for reliability) and thus the average IQ gap is ~11 IQ or so. It seems difficult to imagine the chance of divorce would not be larger in the second case than in the first. However, this could simply be due to a nonlinear effect of intelligence itself, as in, going from 100 to 80 IQ may starkly increase the risk of divorce while going from 100 to 120 may only reduce the chance somewhat. Disentangling such nonlinear effects from any interactions would be difficult, but maybe not impossible.

One study by Nguyen et al 2024 is worth highlighting:

This research examined potential matching patterns between romantic partners in Big Five personality traits and relationship-specific characteristics such as attachment orientations, care giving systems, conflict resolution, partner responsiveness, and trust. We analyzed two existing longitudinal studies that had complementary samples: 184 couples who had dated for less than one year and 168 married or cohabiting couples across the first two years of parenthood. We found evidence for assortative mating across various relationship-specific characteristics both at baseline and longitudinally, which were often stronger in magnitude than assortment based on Big Five traits. However, couples often perceived each other to be more similar than their actual similarity indicated. Further, there was little evidence to support the benefits of between-partner similarity for relationship quality after controlling for actor and partner effects of both partners’ score levels on each construct. We highlighted the importance of including personality assessments beyond one-time, self-reported measures of Big Five traits in investigating assortative mating processes.

In other words, once you control for the main effects of personality traits (self-report scales), then similarity did not predict relationship satisfaction. But again, the sample isn’t large (data like these are hard to collect). As a matter of fact, Big Five self-report data is ill-suited for this purpose since it shows very weak assortment to begin with:

The assortative mating correlation for neuroticism is in fact 0 in this sample, which we suspect to be false given the known mental health diagnoses assortative mating. I think it’s a self-report data issue, but I haven’t seen a study that looked at this specifically. Very few datasets have both self- and other-reported personality data for couples with which one could test this. It’s curious that verbal aggression showed the strongest AM effect (0.42) while agreeableness was minor (0.12), when these conceptually measure quite related things.

Given the general lack of evidence that assortative mating is important, insofar as we can measure it, it is a wonder why nature goes through all this effort to mate like up with like. After all, millions of people are currently single because they can’t find a fitting spouse, and endless numbers of divorces are filled using the “Irreconcilable differences” grounds. Perhaps we need to look at more simple measures and larger samples. I found a study with a very large sample, which found that (Torvik et al 2015):

Background
Poor health and health behaviors are associated with divorce. This study investigates the degree to which six health indicators and health behaviors among husbands and wives are prospectively related to divorce, and whether spousal similarities in these factors are related to a reduced risk of marital dissolution. Theoretically, a reduced risk is possible, because spousal similarity can help the couple’s adaptive processes.

Methods
The data come from a general population sample (19,827 couples) and 15 years of follow-up data on marital dissolution. The following characteristics were investigated: Poor subjective health, obesity, heavy drinking, mental distress, lack of exercise, and smoking. Associations between these characteristics among husbands and wives and later divorce were investigated with Cox proportional hazards regression analyses.

Results
All the investigated characteristics except obesity were associated with marital dissolution. Moreover, spousal similarities in four of these characteristics (heavy drinking, mental distress, no exercise, and smoking) reduced the risk of divorce, compared to the combined main effects of husbands and wives. Nevertheless, couples concordant in these health issues still had higher risks of divorce than couples without these characteristics.

Conclusion
Couples with similar health and health behavior are at a lower risk of divorce than are couples who are dissimilar in health. Health differences may thus be seen as vulnerabilities or stressors, supporting a health mismatch hypothesis. This study demonstrates that people who are similar to each other are more likely to stay together. Harmonizing partners’ health behaviors may be a target in divorce prevention.

So they were able to find positive effects of assortment, in line with the models I made up above. Their table:

They again used binary indicators (crude measures), so the interaction term tests whether couples with 2 “yes” are more or less likely to divorce compared to adding up the two main effects. Looking around the table, the interaction terms are all below 0 suggesting that assortment works in general, but many of the p values are bad, even with this large sample. The most fun finding is that exercise does not really predict divorce, but having both spouses not do it predicts more stability (still only with p = 2% in their adjusted model on the right). The second table gives some numerical values based on these models:

Thus, we can see that the least divorce is among couples where neither exercise, and funnily enough, also among those where both are fat.

There’s another detailed paper using advanced methods of calculating similarity across multiple traits by Luo & Klohnen 2005:

Using a couple-centered approach, the authors examined assortative mating on a broad range of variables in a large (N 291) sample of newlyweds. Couples showed substantial similarity on attitude-related domains but little on personality-related domains. Similarity was not due to social homogamy or convergence. The authors examined linear and curvilinear effects of spouse similarity on self and observer indicators of marital quality. Results show (a) positive associations between similarity and marital quality for personality-related domains but not for attitude-related domains, (b) that similarity on attachment characteristics were most strongly predictive of satisfaction, (c) robust curvilinear effects for husbands but not for wives, (d) that profile similarity remained a significant predictor of marital quality even when spouses’ self-ratings were controlled, and (e) that profile-based similarity indices were better predictors of marital quality than absolute difference scores.

In general, the effect sizes seemed quite small, but it was a bit hard to interpret their tables. Their study found the strange result that personality similarity predicted relationship satisfaction the most, but according to their self-report measures, it was the least correlated among couples (near 0 as in other studies).

The lack of detectable effects suggests that maybe people should just marry at random. However, this has been tested in various TV shows, and it doesn’t usually work out well (GPT):

MAFS [Married at first sight] Australia: Analyses of the AU seasons (2015–2024) suggest a ~7% long-term success rate (still together well after the show), with only about ~10% together a year after filming. Recent entertainment tallies are in the same ballpark (e.g., 9 still together out of 119 matched across 12 seasons ≈ 7.6%).

MAFS UK: Third-party roundups put it roughly around ~10–11% “success” depending on which seasons/definitions you use.

MAFS US: Some reporting/analysis suggests it does a bit better than AU, with one analysis citing ~16% (definitions vary, and the “still together” list changes over time).

Overall summary:

Humans and other animals select mates similar to themselves, and this is also true for non-mating relationships.
Finding such people is a time-consuming affair, so if this doesn’t have some offsetting benefits, then it seems quite suboptimal.
Most research does not find that similarity is important for predicting divorce or satisfaction. Some very large studies find minor positive effects for some traits.
TV show evidence suggests marrying random people doesn’t work well, but maybe this is due to self-selection into going into such an arrangement. Arranged marriages are also generally less happy.
It’s a bit of a mystery.

You Might Also Like

“Why is humankind doomed without eugenics?” #2

Intelligence, income inequality and prison rates: It’s complicated

Should you get off Twitbookgram?