Understanding audit (resumé) studies

John B. Holbein is a mild mannered economist on X. 2 days ago he posted a long post on Lee Jussim’s new review paper of racial discrimination in hiring. Jussim is the stereotype accuracy social psychologist. Holbein says:

Some people argue that discrimination is rampant in our society.

Others argue that few people discriminate.

Who is right?

This article by Lee Jussim takes on this question.

(I found this article to be a fascinating read. It was a little mind-bendy at first, but then it became very intuitive.)

Jussim flags what he calls the “discrimination paradox.”

The apparent paradox is that some high-quality audit studies find large disparities (e.g., in callbacks) against minority applicants, while other equally rigorous studies—lab experiments, field studies of everyday interactions, and platform-based choices—find very few discriminatory acts.

The key point is arithmetic, not a paradox. Outcome gaps in audit studies don’t map cleanly onto the number of discriminatory actors. This is especially true when base rates are low. When positive outcomes are rare (e.g., job callbacks), even a small number of biased decisions can generate large relative disparities in outcomes.

By contrast, many non-audit studies focus directly on individual decisions or acts (e.g. responses in games) and therefore estimate how often discrimination actually occurs at the decision level. Those studies often find discrimination happens infrequently, even though it is systematic.

This helps reconcile findings that otherwise look contradictory: large disparities in outcomes can coexist with discrimination occurring rarely at the level of individual decisions.

So who’s right: those who say discrimination is rare, or those who say only a few people discriminate?

According to this paper, both are.

It’s this one:

  • Jussim, L. The discrimination paradox. Theor Soc 54, 1083–1102 (2025). https://doi.org/10.1007/s11186-025-09652-0

Rigorous studies published within the past eight years have found diametrically opposed results regarding racial discrimination. Some have found that racial discrimination is very rare; others that racial discrimination is very common. The paradox is that they are all well-conducted studies. In this paper, I show why there is no paradox, and the two sets of findings are completely compatible.

The two lines of evidence that Jussim is trying to reconcile are:

  1. Audit (resumé submission/job application) studies. Typically researchers make up fake CVs, keep everything constant but the demographics they are interested in. So they might apply for junior programmer roles. The CV is then some photo, a name, university etc., and the photo shows either a Black/White/whatever person, male/female, or whichever other ‘protected class’ it is currently fashionable to study with regards to discrimination.
  2. Direct action studies. E.g., people engage in mock jury trials, play ultimatum games (player 1 proposes splitting some amount of real money 50-50 or 80-20 etc., and player 2 accepts or not; if they don’t no one gets anything), or some other similar game.

The first kind of study usually finds that protected classes receives fewer callbacks than the Whites. Jussim summarizes one study:

A review and meta-analysis (Quillian et al., 2017) found 21 audit studies of racial discrimination in hiring since 1989 and three additional studies going back to 1972. The studies included over 55,000 applications submitted for over 26,000 jobs. There were two headline findings: (1) On average, White applicants received 36% more callbacks than did Black applicants. (2) This difference did not decline between either 1972 or 1989 and 2015. Indeed, there was weak evidence that it had increased over that time.

On the other hand, direct action studies:

In the Peyton and Huber (2021) study, participants played the ultimatum game 25 times with either Black or White partners, so the total number of offers accepted or refused was over 18,000. They operationalized racial discrimination as occurring when White players rejected offers from Black players that would have been accepted had the person offering been White.

The abstract of the paper emphasizes “racial resentment” and “explicit prejudice.” Indeed, the last sentence declares that “explicit prejudice is widespread.” However, to be clear about their main result regarding discrimination, I quote from the paper directly (pp. 30–31):

The first estimate, a 1.3% point decrease (p <.01) in the probability of acceptance, shows that, on average, white responders engaged in anti-Black discrimination by rejecting offers they would otherwise accept if the proposer was white (M1.1).

1.3%point sounds rather trivial. Not mentioned by Jussim are mock jury trials. They give results like these:

That is, White subjects were very close to race-neutral while Blacks showed a strong pro-Black or anti-White effect.

Anyway, Jussim argues that they can be considered consistent if one does the math right (involving absolute vs. relative percentages). That’s not the point I want to make here though. I want to note that the audit studies have fundamental limits that make them hard to interpret. Back in December when Holbein posted one of the discrimination studies, I replied that:

Given that credentials and the other stuff they put in the applications don’t make them equal workers, it’s entirely rational to infuse the priors, which is what people in the studies do.

The problem here is the same as in my last blogpost (Whose benefit of the doubt?). If groups differ in some hard to measure characteristics, and all you know about them is the CV, a noisy measure, then it is rational to prefer the person from the higher achieving group given identical CVs (‘individuating information’ in Jussim’s terminology). We can illustrate this by using educational attainment and IQ scores. Pew Research has this result concerning science knowledge:

This is a crude categorization of education. We can do better. Since I already had the GSS data ready, we can look at that:

Recall that this is a 10 item test, so it isn’t great. The overall gap is 10.4 IQ (White norms), and controlling for education, it is 8.3 IQ (this is the average distance between the red and turquoise points, weighted by the number of people). Statistically, one year of education is associated with 2.2 IQ, which means that for the IQs to match, Blacks would need about 5 more years of education (you can see this by comparing the Black mean for 18-20 years vs. White 14 years, both of which have means of ~102 IQ). Clearly, then, holding e.g. degree constant as done in audit studies is not sufficient and employers have reasonably statistical grounds for preferring Whites over Blacks given identical years of education.

But there are different institutions that grant degrees, and Whites on average attend better ones. However, the audit studies usually hold the institution constant too. But holding the institution constant isn’t enough either, since most universities practice affirmative action (and even if they didn’t post-selection a gap would remain). Here’s some Harvard SAT scores for applicants and admitted students:

Thus, among Blacks and Whites those admitted to the same institution, the SAT difference is about 40 points (should be about 7 IQ). The situation is thus as Dalliard wrote in 2014 when commenting on the same research:

In a typical audit study, white and black “auditors” with matching (fictitious) credentials apply to low-skill, entry level positions, with the consequence that the studies have very poor ecological validity with respect to the labor market as a whole. The auditors sometimes exist only on paper, but experiments where actual persons are sent to job interviews are neither randomized (race cannot be assigned to individuals) nor double-blind (the auditors know the purpose of the study), which compromises any attempt to make causal inferences. The auditors can never be matched on all the variables that different employers may find important. It is often quite reasonable to regard white applicants as more qualified than ostensibly similar blacks. For example, the average IQ gap between black and white applicants to low-complexity jobs is 0.86 standard deviations, favoring whites (Roth et al., 2001), something that audit studies do not adjust for. Such racial differences may assume a greater-than-usual importance in the decision-making of the audited employers because many other characteristics that normally show racial differences in the applicant population have been experimentally equalized. In recruitment to cognitively more complex occupations, a rational employer would similarly expect a white graduate from a selective college to be smarter and more diligent than a black graduate from a similarly prestigious school, given the widespread use of racial preferences in college admissions. [4]

There is another problem with audit studies though. I know some people who did recruitment for tech companies in the US. There are typically 100s of applicants for a given role. Most of these applications are essentially spam. They are from people who apply to just about any job, no matter if it fits their profile or not. They send 100s or 1000s of applications a day. Anyone sitting and reading applications will run into this issue. Since some groups send a lot more spam than others, recruiters will eventually develop of habit of simply not reading further when they see a person from that group. As such, they may not even have seen the details of the application for the person enough to form a judgment of all the relevant details. In the experience of some people I know who did this work, Indians and Chinese were particularly spammy in their applications (too few Blacks applied for these tech jobs to notice any patterns). These differences in low quality applications make it difficult to interpret audit studies. They implicitly assume that the recruiter sits down, reads the entire application, then mentally (or today using AI) assigns some overall score which is used for deciding whether to move forward with the applicant. Reality is that recruiters often just quickly skim and may skip entirely if the top of the PDF looks bad, or has a spammy ethnic-sounding name. This is a kind of discrimination, of course, to discriminate is to notice a difference, but it’s a quite rational one. Is it fair? Well, fair to whom? It’s unfair to a given member of a low-performing or spammy group that their co-members behave poorly and that it is reasonable for others to be wary as a result. But it is also unfair if everybody has to read through a ton of spam just because some groups behave poorly. There is no perfectly fair solution.

As a matter of fact, on the general topic of discrimination, my favorite book is The Discrimination Myth by Frank Karsten:

Egalitarianism is stronger than ever with quotas for female executives, gender neutral toilets, and courses against prejudice. A teasing joke can suffice to be labelled a racist, sexist or fascist. Educators and politicians also agree: prejudices are wrong, diversity is our strength, and we are all equal. But is that really true? Frank Karsten challenges readers to consider a different view. He argues that the fierce fight against discrimination actually causes more exclusion and polarisation.

The thesis of the book is that freedom of association is strongly violated by various anti-discrimination laws. These laws in fact just make more, not less, discrimination in aggregate. And that’s of top of being extremely expensive in compliance and legal costs. It would be better to do nothing at all about private people associating with whom they choose. If Christian bakers don’t want to make a cake for a gay wedding, then go to another bakery and they miss out on a customer, and a gay-friendly bakery gains a customer. The general mechanism for why free markets minimize non-meritocratic discrimination is that more meritocratic companies fare better in the market, and thus non-meritocratic discrimination will automatically be reduced. The main limitation of this mechanism is when the market doesn’t function correctly. If all the existing banks refuse clients of class X, and you can’t open a new X-friendly bank, then members of X will be stuck in a bad situation. This kind of thing happens sometimes in which case the solution is some kind of competition (‘anti-trust’) law to force the market to work properly (yes, I know some even more hard-line libertarians will say that even these laws don’t work well because monopolies or oligopolies don’t usually last long anyway).