Clear Language, Clear Mind

April 13, 2019

Death by affirmative action: race quotas in medicine

Filed under: intelligence / IQ / cognitive ability,Medicine — Tags: , , , — Emil O. W. Kirkegaard @ 08:57

See also previous 2017 post.

I have been repeatedly asked about this topic, so here is a post that covers the basics. The argument goes like this:

  • Affirmation action is used in admission to medical schools, i.e. they practice race quotas in favor of less intelligent races, in particular blacks and hispanics, and discriminate against whites and especially (East) Asians.
  • Affirmation action results in less intelligent people getting into schools.
  • Less intelligent people tend to drop out more, so we expect and do see higher drop out rates among blacks.
  • Even with differential drop-out rates by race, affirmative action results in less intelligent people graduating and eventually practicing medicine.
  • Less intelligent people have worse job performance in every job type. This is especially the case for highly complex jobs such as being a doctor which results ultimately in patient suffering including untimely death.
  • Thus, bringing it all together, affirmation action for race results in less intelligent blacks and hispanics being admitted to medical schools, and when they don’t drop out, they end up practicing medicine, and in doing so, they do a worse job than white and Asian people would have done, thereby killing people by incompetence.

Given what we know about race, intelligence, and job performance, the conclusion is essentially inevitable. However, as far as I know, no one has actually published a study on malpractice rates by race, or some other direct measure of patient harm. You can probably imagine why that is the case. However, we do have a case study of a similar thing happening in the police (Gottfredson, 1996).

But let’s take things one step at a time.

Affirmation action … in action

Ample data exists about this. Mark Perry has documented it well:

The MCAT (Medical College Admission Test) is a cognitive test used for medical schools, basically their version of the SAT/ACT. I don’t know if anyone has published a study relating this one directly to an IQ test, but we have such data for the other achievement/entrance tests: SAT, ACT, and KTEA.

The table shows that someone with about the same score in the highlighted column has ~22.5% chance of admittance if Asian, and 81% if black. One can also fit a model, and calculate the specific race benefits in odds ratios, as was done in this report:

Medical school acceptance and matriculation rates

The table below shows acceptance and matriculation (people who actually enroll) rates by race (source):

So we see that blacks and hispanics have lower acceptance/matriculation rates. This is because while they gain from affirmative action, they are also worse applicants, and worse than the favoritism granted by affirmative action.

Competency measures

Tables below show average competencies by race for applicants and matriculates (same source as above):

Unhelpfully, no standard deviations are supplied for these measures, so one cannot just glance the Cohen’s d values from the table. These can be found in the table below (source):

These are for the total population of applicants, not whites alone, so the Cohen’s d will be slightly underestimated by using these (say, 10 to 15%). So, then we calculate the d gaps by race to white (I used matriculates, as is most relevant), which are:

Measure d gap to Whites
black Hispanic Asian
MCAT CPBS -0.63 -0.52 0.30
MCAT CARS -0.74 -0.74 -0.04
MCAT BBLS -0.63 -0.44 0.15
MCAT PSBB -0.52 -0.52 0.19
Total MCAT -0.73 -0.63 0.16
GPA Science -0.74 -0.44 0.02
GPA Non-Science -0.54 -0.36 0.04
GPA Total -0.71 -0.44 0.03


Thus, we see that black and Hispanic enrolled students are quite far below whites in academic talent, and Asians somewhat above.

Differential drop-out rates and competence among graduates

Less capable students drop out more, so this tends to even out group gaps among admitted students. However, the process is not 100% effective, so we still end up with gaps among the graduates. The figure and table below show the differential drop-out (source):

Finding competence data for graduates was more tricky, but I was able to find this figure (source):

Thus, we see that the rank order of scores among different races is preserved among graduates as well. The placed/unplaced refers to whether the graduates were able to find a residency.

Medical competency

Instead of relying on more academically broad MCAT, we can look at various medical school tests and exams (which are of course well predicted by the MCAT). First table is from here and concerns a competency test taken at medical school.

The adjusted score refers to the explained gaps after controlling for prior MCAT score and GPA. Women do worse in these old data, as they generally do on high stakes testing.

From the same Michigan report as above, we get a similar result:

The 75th centile black student in medical school do about as well as the 25th centile white and Asian student.

There’s also a similar report for Maryland. See also UK results in this meta-analysis.

MCAT et al predict actual medical performance

There are a variety of studies on this question, and a few reviews (e.g. here, here). Here’s an example of a primary study:


To establish whether successful certifying examination performances of doctors are associated with their patients’ mortality and length of stay following acute myocardial infarction.


Risk adjusted mortality and survivors’ length of stay were compared for doctors who had satisfactorily completed training in internal medicine or cardiology and attempted the relevant examination. Specifically, the study investigated the joint effects of hospital location, availability of advanced cardiac care, doctors’ specializations, certifying examination performances, year certification was first attempted and patient volume.


Data on all acute myocardial infarctions in Pennsylvania for the calendar year 1993 were collected by the Pennsylvania Health Care Cost Containment Council. These data were combined with physician information from the database of the American Board of Internal Medicine.


Holding all variables constant, successful examination performance (i.e. certification in internal medicine or cardiology) was associated with a 19% reduction in mortality. Decreased mortality was also correlated with treatment in hospitals located outwith either rural or urban settings and with management by a cardiologist. Shorter stays were not related to examination performance but were associated with treatment by high volume cardiologists who had recently finished training and who cared for their patients in hospitals located outwith rural or urban settings.


The results of the study add to the evidence supporting the validity of the certifying examination and lend support to the concept that fund of knowledge is related to quality of practice.

There are well-known race difference in job performance in general

Even if we didn’t have the other data, we would be well within reason to conclude that there are race differences in job performance among doctors, because we have large meta-analyses of race differences in work performance in general:

Race differences in medical performance

Thus, finally, we get to the last part. Unfortunately, I have not been able to find a study that looked at individual doctor’s race and malpractice and other patient harms measures. One can however use the race composition of medical schools as a proxy. There is a study of medical schools showing they have consistently different rates, but it didn’t investigate the link to school level metrics such as mean MCAT/GPA among students.

  • Waters, T. M., Lefevre, F. V., & Budetti, P. P. (2003). Medical school attended as a predictor of medical malpractice claims. BMJ Quality & Safety, 12(5), 330-336.

Objectives: Following earlier research which showed that certain types of physicians are more likely to be sued for malpractice, this study explored (1) whether graduates of certain medical schools have consistently higher rates of lawsuits against them, (2) if the rates of lawsuits against physicians are associated with their school of graduation, and (3) whether the characteristics of the medical school explain any differences found.

Design: Retrospective analysis of malpractice claims data from three states merged with physician data from the AMA Masterfile (n=30 288).

Study subjects: All US medical schools with at least 5% of graduates practising in three study states (n=89).

Main outcome measures: Proportion of graduates from a medical school for a particular decade sued for medical malpractice between 1990 and 1997 and odds ratio for lawsuits against physicians from high and low outlier schools; correlations between the lawsuit rates of successive cohorts of graduates of specific medical schools.

Results: Medical schools that are outliers for malpractice lawsuits against their graduates in one decade are likely to retain their outlier status in the subsequent decade. In addition, outlier status of a physician’s medical school in the decade before his or her graduation is predictive of that physician’s malpractice claims experience (p<0.01). All correlations of cohorts were relatively high and all were statistically significant at p<0.001. Comparison of outlier and non-outlier schools showed that some differences exist in school ownership (p<0.05), years since established (p<0.05), and mean number of residents and fellows (p<0.01).

Conclusions: Consistent differences in malpractice experience exist among medical schools. Further research exploring alternative explanations for these differences needs to be conducted.

Of particular interest here is the mentioned database, AMA Masterfile. Perhaps one can obtain access to this.

Worse, one can find studies that investigate the effect of race on patient outcomes — but only the patient’s race and sometimes the interaction of the patient with the doctor, without reporting the doctor’s race main effect! Here’s an example:

Context Many studies have documented race and gender differences in health care received by patients. However, few studies have related differences in the quality of interpersonal care to patient and physician race and gender.

Objective To describe how the race/ethnicity and gender of patients and physicians are associated with physicians’ participatory decision-making (PDM) styles.

Design, Setting, and Participants Telephone survey conducted between November 1996 and June 1998 of 1816 adults aged 18 to 65 years (mean age, 41 years) who had recently attended 1 of 32 primary care practices associated with a large mixed-model managed care organization in an urban setting. Sixty-six percent of patients surveyed were female, 43% were white, and 45% were African American. The physician sample (n=64) was 63% male, with 56% white, and 25% African American.

Main Outcome Measure Patients’ ratings of their physicians’ PDM style on a 100-point scale.

Results African American patients rated their visits as significantly less participatory than whites in models adjusting for patient age, gender, education, marital status, health status, and length of the patient-physician relationship (mean [SE] PDM score, 58.0 [1.2] vs 60.6 [3.3]; P=.03). Ratings of minority and white physicians did not differ with respect to PDM style (adjusted mean [SE] PDM score for African Americans, 59.2 [1.7] vs whites, 61.7 [3.1]; P=.13). Patients in race-concordant relationships with their physicians rated their visits as significantly more participatory than patients in race-discordant relationships (difference [SE], 2.6 [1.1]; P=.02). Patients of female physicians had more participatory visits (adjusted mean [SE] PDM score for female, 62.4 [1.3] vs male, 59.5 [3.1]; P=.03), but gender concordance between physicians and patients was not significantly related to PDM score (unadjusted mean [SE] PDM score, 76.0 [1.0] for concordant vs 74.5 [0.9] for discordant; P=.12). Patient satisfaction was highly associated with PDM score within all race/ethnicity groups.

Conclusions Our data suggest that African American patients rate their visits with physicians as less participatory than whites. However, patients seeing physicians of their own race rate their physicians’ decision-making styles as more participatory. Improving cross-cultural communication between primary care physicians and patients and providing patients with access to a diverse group of physicians may lead to more patient involvement in care, higher levels of patient satisfaction, and better health outcomes.

Note the use of “race-concordant”, which means patient and doctor had the same race. This is in fact a simple coding of the interaction effect between patient and physician race.

Bonus: really nice p-values in those findings. No worries, the study only has ~1900 citations.

Oh, and final point. From a public health perspective, affirmative action probably mostly kills blacks and Hispanics. People prefer to befriend and date others who are the same race, and this ethno-centrism also applies to patient’s choice of physicians. As such, the incompetent black and Hispanic physicians are mostly treating and thus harming black and Hispanic patients, who would have been better off with a white or Asian doctor.

April 27, 2018

Review: The Censor’s Hand (Carl E. Schneider)

Filed under: Book review,Ethics,Medicine — Tags: , — Emil O. W. Kirkegaard @ 06:36

Medical and social progress depend on research with human subjects. When that research is done in institutions getting federal money, it is regulated (often minutely) by federally required and supervised bureaucracies called “institutional review boards” (IRBs). Do–can–these IRBs do more harm than good? In The Censor’s Hand, Schneider addresses this crucial but long-unasked question.
Schneider answers the question by consulting a critical but ignored experience–the law’s learning about regulation–and by amassing empirical evidence that is scattered around many literatures. He concludes that IRBs were fundamentally misconceived. Their usefulness to human subjects is doubtful, but they clearly delay, distort, and deter research that can save people’s lives, soothe their suffering, and enhance their welfare. IRBs demonstrably make decisions poorly. They cannot be expected to make decisions well, for they lack the expertise, ethical principles, legal rules, effective procedures, and accountability essential to good regulation. And IRBs are censors in the place censorship is most damaging–universities.
In sum, Schneider argues that IRBs are bad regulation that inescapably do more harm than good. They were an irreparable mistake that should be abandoned so that research can be conducted properly and regulated sensibly.

Did you read Scott Alexander’s blogpost on his horror story of IRB and was wondering whether there’s more of that kind? Well, there is, and Schneider has written an entire book about it. Schneider is a rare breed of a polymath, being professor of both medicine and law (University of Michigan). The book proceeds in fairly simple steps:

  1. Contrary to a few sensationalized stories about Nazi camp experiments and the Tuskegee experiment, harm to patients in research was actually very rare before the implementation of IRBs. In fact, it’s safer to be in research than to be in regular medical care. So, for IRB to make sense, it must further reduce the already low levels of harm while not creating additional harm by wasting researchers time and money, delaying useful treatments etc.
  2. Based on the available evidence, IRB as currently practiced completely fails the above. It is expensive, slow, and arbitrary. He cites experiments where the same protocols where sent to different IRBs only to get different judgments, sometimes even with contradictory revision requirements (one demanded children’s parents be told, one demanded they shouldn’t be told).
  3. He diagnoses the problems of IRBs as being due to lack of clear regulations (paradoxically), instead relying on supposedly clear but vague principles like those in the Belmont Report: respect for persons, beneficence, and justice. Furthermore, members of IRBs don’t have the necessary expertise to know how to deal with the studies they are supposed to regulate since they are rarely subject matter experts, and indeed, some are complete laymen.
  4. He further argues that the very nature of IRBs’ event licensing — need permission before doing anything, instead of the usual get punished if you do something wrong, i.e. reverse burden of proof — results in ever creeping scope of IRBs, and the system is thus fundamentally broken and cannot be fixed with reforms. They originally were meant for medical research, but not try to regulate pretty much everything in social science.

He illustrates the above with a number of disturbing case studies, similar to that from Alexander’s post. Let’s start with a not extremely egregious one:

Intensive care units try to keep desperately ill people alive long enough for their systems to recover. Crucial to an ICU’s technology is the plastic tube threaded through a major vein into the central circulatory system. This “central line” lets doctors give drugs and fluids more quickly and precisely and track the patient’s fluid status better.

Every tool has drawbacks. An infected IV in your arm is a nuisance, but the tip of a central line floats near your heart and can spread bacteria throughout your body. When antibiotics fail to stop these infections, patients die. Because there is one central-line infection for every 100 or 200 patient-days, a hospital like St. Luke’s in Houston, with about 100 ICU beds, will have a central-line infection every day or two. There are perhaps 10,000 or 20,000 central-line fatalities annually, and a 2004 study estimated 28,000.

There is a well-known sequence of steps to follow to reduce these infections: (1) wash your hands, (2) don cap, mask, and gown, (3) swab the site with antibiotic, (4) use a sterile full-length drape, and (5) dab on antibiotic ointment when the line is in. Simple enough. But doctors in one study took all five steps only 62% of the time. No surprise. Doctors might forget to wash their hands. Or use an inferior alternative if the right drape or ointment is missing.

Peter Pronovost is a Johns Hopkins anesthesiologist and intensivist who proposed three changes. First, have a nurse with a checklist watching. If the doctor forgets to wash his hands, the nurse says, “Excuse me, Doctor McCoy, did you remember to wash your hands?” Second, tell the doctor to accept the nurse’s reminder—to swallow hard and say, “I know I’m not perfect. I’ll do it right.” Third, have ICUs stock carts with everything needed for central lines.

It worked. Central-line infections at Johns Hopkins fell from 11 to about zero per thousand patient-days. This probably prevented 43 infections and 8 deaths and saved $2 million. In medical research, reducing a problem by 10% is ordinarily a triumph. Pronovost almost eliminated central-line infections. But would it work in other kinds of hospitals? Pronovost enlisted the Michigan Hospital Association in the Keystone Project. They tried the checklist in hospitals big and small, rich and poor. It worked again and probably saved 1,900 lives.

Then somebody complained to OHRP that Keystone was human-subject research conducted without informed consent. OHRP sent a harsh letter ordering Pronovost and the MHA to stop collecting data. OHRP did not say they had to stop trying to reduce infections with checklists; hospitals could use checklists to improve quality. But tracking and reporting the data was research and required the patients’, doctors’, and nurses’ consent. And what research risks did OHRP identify? Ivor Pritchard, OHRP’s Acting Director, argued that

“the quality of care could go down,” and that an IRB review makes sure such risks are minimized. For instance, in the case of Pronovost’s study, using the checklist could slow down care, or having nurses challenge physicians who were not following the checklist could stir animosity that interferes with care. “That’s not likely, but it’s possible,” he said.

Basically, experimenting is okay, as long as you don’t collect any data to know whether something worked or not. Obvious example of stifling of useful research. Another one:

In adult respiratory distress syndrome (ARDS), lungs fail. Patients not ventilated die, as do a third to a half of those who are. Those who survive do well, so ventilating properly means life or death.

Respirators have multiple settings (for frequency of breathing, depth of breathing, oxygen percentage, and more). The optimal combination depends on factors like the patient’s age, sex, size, and sickness. More breathing might seem better but isn’t, since excessive ventilation can tax bodies without additional benefit. Respirator settings also affect fluid balance. Too much fluid floods the lungs and the patient drowns; too little means inadequate fluid for circulatory functions, so blood pressure drops, then disappears.

In 1999, a National Heart, Lung, and Blood Institute study was stopped early when lower ventilator settings led to about 25% fewer deaths. But that study did not show how low settings should be or how patients’ fluid status should be handled. So the NHLBI got eminent ARDS specialists to conduct a multisite randomized trial of ventilator settings and fluid management.

In November 2001, two pulmonologists and two statisticians at the NIH Clinical Center sent OHRP a letter criticizing the study design Pressed by OHRP, the NHLBI suspended enrollment in July 2002: the federal institute with expertise in lung disease bowed to an agency with no such expertise. NHLBI convened a panel approved by OHRP. It found the study well-designed and vital. OHRP announced its “serious unresolved concerns” and demanded that the trials remain suspended. Meanwhile, clinicians had to struggle.

Eight months later, OHRP loosed its hold, without comment on the costs, misery, and death it had caused. Rather, it berated IRBs for approving the study without adequately evaluating its methodology, risks and benefits, and consent practices. It did not explain how an IRB could do better when OHRP and NHLBI had bitterly disagreed.

And the ridiculous:

Helene Cummins, a Canadian sociologist, knew that many farmers did not want their children to be farmers because the life was hard and the income poor. She wondered about “the meaning of farm life for farm children.” She wanted to interview seven- to twelve-year-olds about their parents’ farms, its importance to them, pleasant and unpleasant experiences, their use of farm machinery, whether they wanted to be farmers, and so on.

Cummins’ REB [same as IRB] first told her she needed consent from both parents. She eventually dissuaded them. They then wanted a neutral party at her interviews. A “family/child therapist” told the REB that “there would be an inability of young children to reply to some of the questions in a meaningful way,” that it was unlikely that children would be able to avoid answering a question, and that the neutral party was needed to ensure [again] an ethical level of comfort for the child, and to act as a witness.” Cummins had no money for an observer, thought one might discomfit the children, and worried about the observers’ commitment to confidentiality. Nor could she find any basis for requiring an observer in regulations or practice. She gathered evidence and arguments and sought review by an institutional Appeal REB, which took a year. The Appeal REB eventually reversed the observer requirement.

Farm families were “overwhelmingly positive.” Many children were eager and excited; siblings asked to be interviewed too. Children showed Cummins “some of their favorite places on the farm. I toured barns, petted cows, walked to ponds, sat on porches, and watched the children play with newborn kittens.” Cummins concluded that perhaps “a humble researcher who respects the kids who host her as smart, sensible, and desirous of a good life” will treat them ethically.

There are some links to the growing snowflake craze, namely that IRBs are tasked with protecting so-called vulnerable groups. But what exactly is that? Well, because IRBs want more power, they continuously expand the category to include pretty much everybody. When dealing with vulnerable groups, extra rules apply, so basically this is a power grab to expand the extra rules to just about all cases.

Regulationists’ most common questions about vulnerability are expanding IRB authority, “with the answer usually being ‘yes.’”99 The regulations already say that subjects “likely to be vulnerable to coercion or undue influence, such as children, prisoners, pregnant women, mentally disabled persons, or economically or educationally disadvantaged persons,” require “additional safeguards” to protect their “rights and welfare.”100 IRBs can broaden their authority in two ways. First, “additional safeguards” and “rights and welfare” are undefined. Second, the list of vulnerable groups is open-ended and its criteria invitingly unspecified.

Who might not be vulnerable? Children are a quarter of the population. Most women become pregnant. Millions of people are mentally ill or disabled. “Economically and educationally disadvantaged” may comprise the half of the population below the mean, the three quarters of the adults who did not complete college, the quarter that is functionally illiterate, the other quarter that struggles with reading, or the huge majority who manage numbers badly. And it is easily argued, for example, that the sick and the dying are vulnerable.

Basically, you can expect to be persuaded somewhat towards libertarianism by reading this book. It’s a prime example of inept, slow, counter-productive, expensive regulation slowing down progress for everybody.

August 9, 2017

Health dysgenics: a very brief review

Filed under: Genetics / behavioral genetics,Medicine,Reproductive genetics — Tags: , , — Emil O. W. Kirkegaard @ 08:42

Woodley reminded me of the dysgenics for health outcomes by linking me to a study about the increasing rates of cancer. I had first reached this conclusion back in 2005 when I realized what it means for evolution that we essentially keep almost everyone alive despite their genetic defects. The problem is quite simple: mutations accumulate and mutations have net negative impact on the functioning of the body. Most of the genome appears not to be relevant for anything (‘junk’ / non-coding), so mutations in these areas don’t do anything. Of mutations that hit other areas, many of them are synonymous, and thus usually have no effect. Mutations in areas that matter generally have negative effects. Why? The human body is an intricate machine and it’s easier to fuck it up when you make random changes to the blueprint/recipe than improving upon it. So, basically, one has to get rid of the harmful mutations that happen and this is done via death and mate choice (preference for healthy partners), collectively: purifying selection. Humans still have mate choice and some natural selection, but the natural selection has been starkly reduced in strength since before medicine that actually works (i.e. not bloodletting etc.), and thus, by mutation-selection balance, the rates of genetic disorders and genetic dispositions for disease should increase. In other words, mutation load for disease in general should increase. Does it?

It’s not quite so simple to answer because of various confounders. The most important ones are improved diagnosing (we have better equipment to spot disorders now) and population aging (older people are more sick). Population aging can be avoided by compared same-aged samples measured in different times. Diagnosis changes are much harder to deal with, and one has to look for data where the diagnostic criteria either did not change for the period in question or changed in a way we can adjust for.

There’s also another issue. For a number of decades, we have been using a clever form of selection: prenatal screening (and preconception screening in some groups), which obviously selects against mutational load for the screened diseases. However, most of this testing is for aneuplodies (mostly Down’s) which usually results in sterile offspring and is thus irrelevant for mutational load for disease (because it is not contributed to the gene pool). However, some of the testing is for specific diseases, usually ones that happen to be quite prevalent in some racial group: Tay-Sachs etc. in Ashkenazis, Charlevoix-Saguenay etc. in Quebecians, Aspartylglucosaminuria etc. in Finns etc. One obviously cannot look for evidence of dysgenics for these diseases as the selection against them distorts the picture.

The studies

I didn’t do a thorough search. In fact, these were the first two studies I found plus the one Michael found. The point of this review is to bring the idea to your mind, not prove it conclusively with an exhaustive review.


Cancer incidence increasing globally: The role of relaxed natural selection

Cancer incidence increase has multiple aetiologies. Mutant alleles accumulation in populations may be one of them due to strong heritability of many cancers. The opportunity for the operation of natural selection has decreased in the past ~150 years because of reduction of mortality and fertility. Mutation-selection balance may have been disturbed in this process and genes providing background for some cancers may have been accumulating in human gene pools. Worldwide, based on the WHO statistics for 173 countries the index of the opportunity for selection is strongly inversely correlated with cancer incidence in peoples aged 0-49 and in people of all ages. This relationship remains significant when GDP, life expectancy of older people (e50), obesity, physical inactivity, smoking and urbanization are kept statistically constant for fifteen (15) out of twenty-seven (27) individual cancers incidence rates. Twelve (12) cancers which are not correlated to relaxed natural selection after considering the six potential confounders are largely attributable to external causes like viruses and toxins. Ratios of the average cancer incidence rates of the 10 countries with highest opportunities for selection to the average cancer incidence rates of the 10 countries with lowest opportunities for selection are 2.3 (all cancers at all ages), 2.4 (all cancers in 0-49 years age group), 5.7 (average ratios of strongly genetically based cancers) and 2.1 (average ratios of cancers with less genetic background).

Coeliac disease

Increasing prevalence of coeliac disease over time

Background  The number of coeliac disease diagnoses has increased in the recent past and according to screening studies, the total prevalence of the disorder is around 1%.
Aim  To establish whether the increased number of coeliac disease cases reflects a true rise in disease frequency.
Methods  The total prevalence of coeliac disease was determined in two population-based samples representing the Finnish adult population in 1978–80 and 2000–01 and comprising 8000 and 8028 individuals, respectively. Both clinically–diagnosed coeliac disease patients and previously unrecognized cases identified by serum endomysial antibodies were taken into account.
Results  Only two (clinical prevalence of 0.03%) patients had been diagnosed on clinical grounds in 1978–80, in contrast to 32 (0.52%) in 2000–01. The prevalence of earlier unrecognized cases increased statistically significantly from 1.03% to 1.47% during the same period. This yields a total prevalence of coeliac disease of 1.05% in 1978–80 and 1.99% in 2000–01.
Conclusions  The total prevalence of coeliac disease seems to have doubled in Finland during the last two decades, and the increase cannot be attributed to the better detection rate. The environmental factors responsible for the increasing prevalence of the disorder are issues for further studies.

Arthritis and other rheumatic conditions

Estimates of the prevalence of arthritis and other rheumatic conditions in the United States: Part II

To provide a single source for the best available estimates of the US prevalence of and number of individuals affected by osteoarthritis, polymyalgia rheumatica and giant cell arteritis, gout, fibromyalgia, and carpal tunnel syndrome, as well as the symptoms of neck and back pain. A companion article (part I) addresses additional conditions.
The National Arthritis Data Workgroup reviewed published analyses from available national surveys, such as the National Health and Nutrition Examination Survey and the National Health Interview Survey. Because data based on national population samples are unavailable for most specific rheumatic conditions, we derived estimates from published studies of smaller, defined populations. For specific conditions, the best available prevalence estimates were applied to the corresponding 2005 US population estimates from the Census Bureau, to estimate the number affected with each condition.
We estimated that among US adults, nearly 27 million have clinical osteoarthritis (up from the estimate of 21 million for 1995), 711,000 have polymyalgia rheumatica, 228,000 have giant cell arteritis, up to 3.0 million have had self-reported gout in the past year (up from the estimate of 2.1 million for 1995), 5.0 million have fibromyalgia, 4–10 million have carpal tunnel syndrome, 59 million have had low back pain in the past 3 months, and 30.1 million have had neck pain in the past 3 months.
Estimates for many specific rheumatic conditions rely on a few, small studies of uncertain generalizability to the US population. This report provides the best available prevalence estimates for the US, but for most specific conditions more studies generalizable to the US or addressing understudied populations are needed.

Does it matter?

Yes. Treating diseases, especially rare diseases, is extremely expensive. As such, for countries with public health-care, there’s a very strong economic argument in favor of health eugenics via editing or embryo/gamete selection.

Socio-economic burden of rare diseases: A systematic review of cost of illness evidence

Cost-of-illness studies, the systematic quantification of the economic burden of diseases on the individual and on society, help illustrate direct budgetary consequences of diseases in the health system and indirect costs associated with patient or carer productivity losses. In the context of the BURQOL-RD project (“Social Economic Burden and Health-Related Quality of Life in patients with Rare Diseases in Europe”) we studied the evidence on direct and indirect costs for 10 rare diseases (Cystic Fibrosis [CF], Duchenne Muscular Dystrophy [DMD], Fragile X Syndrome [FXS], Haemophilia, Juvenile Idiopathic Arthritis [JIA], Mucopolysaccharidosis [MPS], Scleroderma [SCL], Prader-Willi Syndrome [PWS], Histiocytosis [HIS] and Epidermolysis Bullosa [EB]). A systematic literature review of cost of illness studies was conducted using a keyword strategy in combination with the names of the 10 selected rare diseases. Available disease prevalence in Europe was found to range between 1 and 2 per 100,000 population (PWS, a sub-type of Histiocytosis, and EB) up to 42 per 100,000 population (Scleroderma). Overall, cost evidence on rare diseases appears to be very scarce (a total of 77 studies were identified across all diseases), with CF (n=29) and Haemophilia (n=22) being relatively well studied, compared to the other conditions, where very limited cost of illness information was available. In terms of data availability, total lifetime cost figures were found only across four diseases, and total annual costs (including indirect costs) across five diseases. Overall, data availability was found to correlate with the existence of a pharmaceutical treatment and indirect costs tended to account for a significant proportion of total costs. Although methodological variations prevent any detailed comparison between conditions and based on the evidence available, most of the rare diseases examined are associated with significant economic burden, both direct and indirect.

Economic burden of common variable immunodeficiency: annual cost of disease

Objectives: In the context of the unknown economic burden imposed by primary immunodeficiency diseases, in this study, we sought to calculate the costs associated with the most prevalent symptomatic disease, common variable immunodeficiency (CVID). Methods: Direct, indirect and intangible costs were recorded for diagnosed CVID patients. Hidden Markov model was used to evaluate different disease-related factors and Monte Carlo method for estimation of uncertainty intervals. Results: The total estimated cost of diagnosed CVID is US$274,200/patient annually and early diagnosis of the disease can save US$6500. Hospital admission cost (US$25,000/patient) accounts for the most important expenditure parameter before diagnosis, but medication cost (US$40,600/patients) was the main factor after diagnosis primarily due to monthly administration of immunoglobulin. Conclusion: The greatest cost-determining factor in our study was the cost of treatment, spent mostly on immunoglobulin replacement therapy of the patients. It was also observed that CVID patients’ costs are reduced after diagnosis due to appropriate management.

There’s also lots of these kinds of studies, the second paper summarizes a number of them for this cluster of diseases:

A Spanish study reported that mean annual treatment costs for children and adult PID patients were e 6520 and 17,427, respectively. Total treatment costs spent on IVIg therapy proce- dures in Spain were approximately e 91.8 million annually, of which 94% consisted of drug cost [27] . Another study conducted in Belgium estimated the annual costs for IVIg therapy on an average to be e 12,550 [28].

Galli et al . [29] assessed the economic impact associated with method of treatment of PID patients in Italy. Regarding the monthly treatment costs associated with the treatment of a typ- ical 20 kg child, the study reported antibiotic therapy to cost of e 58,000, Ig cost of e 468,000 and patients ’ hospitalizations cost of e 300,000 for IVIg method.

Haddad et al . [26] conducted a cost analysis study in the French setting and reported the total monthly treatment cost for a patient using hospital-based 20 g IVIg to be e 1192.19, in which approximately 57% of the total treatment cost was spent on Ig preparation and 39% on hospital admission charges. Another investigation on French PID patients demonstrated the yearly cost of hospital-based IVIg to be e 26,880 per patient [30] .

Other cost analysis studies comparing the direct cost impacts of Ig replacement methods reported annual per patient costs for hospital-based IVIg were US$14,124 in Sweden [31] , e 31,027 and e 17,329 for adults and children in Germany, respectively [32] , and e 18,600 in UK [33] . On the basis of one Canadian study, we found that total annual base case expenditure for hospital-based IVIg therapy of children and adults were $14,721 and $23,037 (in Canadian dollars), respectively. The annual per patient cost of Ig was 75%, the cost of physician and nurse care and hospital admission was 16% and the cost of time lost because of treatment was 8% [34]

The Genomic Health Of Ancient Hominins?

Davide Piffer reminded me that there is study of ancient genomes’ health, which finds that:

The genomes of ancient humans, Neandertals, and Denisovans contain many alleles that influence disease risks. Using genotypes at 3180 disease-associated loci, we estimated the disease burden of 147 ancient genomes. After correcting for missing data, genetic risk scores were generated for nine disease categories and the set of all combined diseases. These genetic risk scores were used to examine the effects of different types of subsistence, geography, and sample age on the number of risk alleles in each ancient genome. On a broad scale, hereditary disease risks are similar for ancient hominins and modern-day humans, and the GRS percentiles of ancient individuals span the full range of what is observed in present day individuals. In addition, there is evidence that ancient pastoralists may have had healthier genomes than hunter-gatherers and agriculturalists. We also observed a temporal trend whereby genomes from the recent past are more likely to be healthier than genomes from the deep past. This calls into question the idea that modern lifestyles have caused genetic load to increase over time. Focusing on individual genomes, we find that the overall genomic health of the Altai Neandertal is worse than 97% of present day humans and that Otzi the Tyrolean Iceman had a genetic predisposition to gastrointestinal and cardiovascular diseases. As demonstrated by this work, ancient genomes afford us new opportunities to diagnose past human health, which has previously been limited by the quality and completeness of remains.

The authors themselves note the connection to the proposed recent dysgenic selection:

The genomic health of ancient individuals appears to have improved over time (Figure 3B). This calls into question the idea that genetic load has been increasing in human populations (Lynch 2016). However, there exists a perplexing pattern: ancient individuals who lived within the last few thousand years have healthier genomes, on average, than present day humans. This deviation from the observed temporal trend of improved genomic health opens up the possibility that deleterious mutations have accumulated in human genomes in the recent past. The data presented here do not provide adequate information to address this hypothesis, which we leave for future follow-up studies.

In other words, we expect the recent pattern to look something like this:

August 8, 2017

WHO on genomics and health, 2002

Filed under: Medicine,Politics,Sociology — Tags: , — Emil O. W. Kirkegaard @ 20:48

I have been tweeting annotated snippets from a WHO report I’m reading. Like this:

Basically, the report does a decent job at summarizing the state of the art in 2002, and has some interesting notes for the future. It also contains a shit ton of socialist politics about reducing inequalities in health, both within and between countries, especially between. Perhaps this is reflection of their ‘medical approach’ approach to health (fix problems so that everybody attains healthy status) instead of the ‘optimizing approach’ where the goal is to just generally improve health without any particular focus on reducing inequality (might even increase it).

I was asked to upload my annotated copy. Here goes:

October 5, 2016

Causal effects of head injuries revisited

Filed under: Genetics / behavioral genetics,Medicine — Tags: , — Emil O. W. Kirkegaard @ 17:04

Comment on:

Real abstract is too damn long, this is my TL;DR version:

Sibling control study of TBI as a causal effect for various outcomes. Data = Swedish register data, n = very large. Findings: sibling comparisons consistent with causal effects of TBI, but there is some familial confounding.

Before going further, I’d like to say that I believe in the causal effects of TBI for some outcomes.

With that said: sibling comparisons are a natural experiment-type design that allows us to control for two things by design alone: shared ‘environmental’ effects — any non-genetic effects shared by siblings that make them more similar — and genetic effects. Let’s put on the JayMan hat: our prior is that similarity between family members is 100% due to genetics and 0% due to non-genetics. We can examine the results to see if they are consistent with a genetics + noise model, i.e. no causal effects of TBI. It goes like this: the raw association between TBI and outcomes is 100% due to genetics. When we use a sibling comparison, we adjust for 50% of the genetic confound, so we expect that the RR is reduced by 50%. If we had MZ data, we expect the RR to be 1.00. Are the numbers consistent with this?

Here’s the results from the paper:


(Model 3 is adjusted for some kind of educational variable. This seems fishy, especially for the educational outcome. Unfortunately, non-adjusted results are not reported.)

Then we calculate the predictions of a genetic + noise model:


We observe:

  • A mean reduction of 54% of RR for raw to sibling. Genetics + noise model predicts 50%.
  • Mean RR for hypothetical MZ of 1.07. Genetics + noise model predicts 1.00.

These data are either perfectly or very close to perfectly consistent with a pure genetic confound model. That being said, we can see some variation across outcomes, which may or may not be real variance or just sampling variation (n is large, but head injuries are rare). In any case, the most plausible (to me) causal effect is disability pension, which also has the largest RR for our hypothetical MZs of 1.22. That is, head injury causes a 22% increase in chance that one will be disability pension. Not a large effect, but one I’d like to avoid.

By writing this post, I feel a little bit like Fisher.

July 5, 2016

Organ donation consent vs. actual rates

Filed under: Medicine,Politics — Tags: , , , , , , , — Emil O. W. Kirkegaard @ 05:43

There is a famous paper arguing the case for libertarian paternalism by using organ donation consent rates.

Johnson, E. J., & Goldstein, D. (2003). Do defaults save lives?. Science, 302(5649), 1338-1339.

The main result is this:


So having opt-out drastically increases consent rates compared with opt-in. These countries have various other differences between them, but the effect size is huge.

But what about actual donation rates? As it is, there is a pan-Nordic organization for this, Scandiatransplant. They publish their data. So I downloaded the last 10 years worth of data and calculated the actual donation rates per 100k persons. They look like this:


The line varies a lot for Iceland because the population is fairly small (about 300k). We see that the donation dates for Denmark and Sweden are quite comparable despite the huge difference in organ consent rates. So, apparently the bottom line did not change much despite the difference in consent rates. Since people still die on waiting lists (Danish data), there must be some other limiting factor.


organ_donation.csv data file

R code:

p_load(readr, plyr, magrittr, ggplot2)

d_organ = read_csv("organ_donation.csv")

#per capita x 100k
d_organ %<>% mutate(Donations_per_100k = (Donations / Population)*100000)

ggplot(d_organ, aes(Year, Donations_per_100k, color = Country)) +
  geom_line(size = 1) +
  scale_x_continuous(breaks = 2000:3000)


February 28, 2016

Admixture mapping assisted GWAS

Medical researchers have noticed that some diseases differ by SIRE (self-identified race/ethnicity) groups which differ by genomic (racial) ancestry. Hence, when genomic measures became available (last 15 years or so), they measured peoples relative proportions of ancestry in mixed populations to see if the diseases would be predictable by ancestry. They were. This establishes with high certainty that the group difference is genetic. Of course, this could be used on other traits such as cognitive ability, personality and socioeconomic outcomes (and it has for the latter, we are doing a meta-analysis).

None of this is new. However, here comes the trick. If we know that two ancestries differ for some trait, we can use this to pinpoint where on the genome the genes for this trait are. How? Consider the simple case where we have a disease trait where persons can be clearly scored as either/or (e.g. ever had prostate cancer). We gather a lot of persons with mixed ancestry, cases and controls. Then we look at difference in their ancestry at small intervals of their chromosomes. Because the trait is genetically linked to one ancestry, the cases will show more of the ancestry in some parts of their genome than the controls. These are the places to look for causal variants at. Consider the two figures from Winkler et al 2010.

Winkler, C. A., Nelson, G. W., & Smith, M. W. (2010). Admixture mapping comes of age. Annual review of genomics and human genetics, 11, 65-89.

admix map0

admix map1

Winkler reports that the sample sizes needed for this are much smaller than blind-within ancestry standard GWAS (as usually done on Europeans).

This method can be combined with the standard GWAS method. I envision that one ‘simply’ does this by first establishing that a trait differs for genetic reasons across two ancestries in an admixtured group (e.g. African Americans, Hispanics/Mexicans/Mestizos). Then one uses the admixture mapping to establish plausible segments of the genome where causal variants are. This may not be enough to identify specific variants, but one can then use these values as the prior probabilities for a standard GWAS. This should drastically decrease the necessary sample size to find the variants and thus accelerate the search. The faster we can find the variants, the faster we can either edit them directly with CRIPR-like methods or use the information for selection with embryo selection.

This may make it more tractable to find variants when we start doing GWAS on full genomes instead of SNP genomes. SNP genomes have 1e5 to 1e6 variants to examine, but whole genomes have about 3e12, i.e. about 1e6 times more. This means that false positives are an even larger problem. The added statistical power from admixture mapping assisted GWAS should help nicely here.

There is one significant problem with this method, but it is not a scientific one: To use it for a given trait, one must come to terms with the existence of a genetic group difference. It is not surprising that it has primarily been used on disease traits where these claims meet less resistance (because it is about helping). However, a recent mainstream study has examined height and BMI group differences in European populations, so there is some hope that times are a changin.

December 27, 2014

Reanalysis of Josefsson et al (2014)’s meta-analysis of exercise as treatment for depression

Filed under: Medicine — Tags: , , — Emil O. W. Kirkegaard @ 06:30

A link on Reddit claimed that exercise could be an effective treatment for depression. I felt it necessary to comment that:

Exercise does not have a causal effect on depression or happiness according to twin-control studies (1, 2).

Another person replied with a meta-analysis which I then took a look at. It showed a large effect, apparently giving inconsistent results with the twin studies.

My reply:

I skimmed the meta-analysis you found. It’s here for readers without academic access.

As you can see, it included all kinds of studies, but they were generally small (Table 2+3). There was no analysis of publication bias. Very strange given the ubiquity of this problem. As I reviewer I would not approve a meta-analysis of this sort without an analysis of publication bias.

Luckily, a simply method (funnel plot) of checking publication bias can be applied to the studies given in Figure 1. Here’s the data and the plot.

As you can see, the smaller studies tended to report larger effect sizes in line with publication bias hypothesis. The effect was very strong as you can see, r=-.758.

Perhaps this inconsistency can be solely explained by publication bias.

Update 2015-09-28:

A new study made the light of the day on Reddit. Since I had already examined this issue, I wrote a reply stating that the evidence indicates that exercise does not work as a treatment for depression. In line with the usual tactics, stating that something doesn’t work is a sure way to get down-voted.

More strangely, a user tried to counter my post (this one) with the citation of another meta-analysis, which turned out to be older and also doesn’t have any analysis of publication bias!

The two cited twin studies:

November 29, 2014

Criticism of BMI and simple IQ tests – a conceptual link?

Filed under: Critical thinking / meta-thinking,Medicine — Tags: , , — Emil O. W. Kirkegaard @ 00:57

BMI (body mass index) is often used a proxy for fat percent or similar measures. This is for a good reason:

BMI et al

So, the mean correlation across age groups and gender is very high, around .77 (unweighted mean across genders). There is a clear age gradient such that the correlation being higher at younger ages, which is opposite of what the body builder-confound would predict (few >80’s are body builders). But it does work slightly better for women (.78 vs. .75), perhaps because there are more male body builders.

BMI has a proven track record of predictive power of many health conditions, yet it still receives lots of criticism due to the fact that it gives misleading results for some groups, notably body builders. There is a conceptual link here with the criticism of simple IQ tests, such as Raven’s which ‘only measure ability to spot figures’. Nonverbal matrix tests such as Raven’s or Cattell’s do indeed not measure g as well as more diverse batteries do (Johnson et al). These visual tests could be similarly criticized for not working well on those with bad eyesight. However, they are still useful for a broad sample of the population.

Criticisms like this strike me as an incarnation of the perfect solution/Nirvana fallacy:

The perfect solution fallacy (aka the nirvana fallacy) is a fallacy of assumption: if an action is not a perfect solution to a problem, it is not worth taking. Stated baldly, the assumption is obviously false. The fallacy is usually stated more subtly, however. For example, arguers against specific vaccines, such as the flu vaccine, or vaccines in general often emphasize the imperfect nature of vaccines as a good reason for not getting vaccinated: vaccines aren’t 100% effective or 100% safe. Vaccines are safe and effective; however, they are not 100% safe and effective. It is true that getting vaccinated is not a 100% guarantee against a disease, but it is not valid to infer from that fact that nobody should get vaccinated until every vaccine everywhere prevents anybody anywhere from getting any disease the vaccines are designed to protect us from without harming anyone anywhere.

Any measure that has more than 0 validity can be useful in the right circumstances. If a measure has some validity and is easy to administer (BMI or non-verbal pen and paper group tests), they can be very useful even if they have less validity than better measures (fat% test or full battery IQ tests).

Anyway, BMI should probably/perhaps retired now because we have found a more effective measure (Ashwell et al):

Our aim was to differentiate the screening potential of waist-to-height ratio (WHtR) and waist circumference (WC) for adult cardiometabolic risk in people of different nationalities and to compare both with body mass index (BMI). We undertook a systematic review and meta-analysis of studies that used receiver operating characteristics (ROC) curves for assessing the discriminatory power of anthropometric indices in distinguishing adults with hypertension, type-2 diabetes, dyslipidaemia, metabolic syndrome and general cardiovascular outcomes (CVD). Thirty one papers met the inclusion criteria. Using data on all outcomes, averaged within study group, WHtR had significantly greater discriminatory power compared with BMI. Compared with BMI, WC improved discrimination of adverse outcomes by 3% (P < 0.05) and WHtR improved discrimination by 4–5% over BMI (P < 0.01). Most importantly, statistical analysis of the within-study difference in AUC showed WHtR to be significantly better than WC for diabetes, hypertension, CVD and all outcomes (P < 0.005) in men and women.
For the first time, robust statistical evidence from studies involving more than 300 000 adults in several ethnic groups, shows the superiority of WHtR over WC and BMI for detecting cardiometabolic risk factors in both sexes. Waist-to-height ratio should therefore be considered as a screening tool. (Ashwell et al, 2012)

It may even be that some of these measures are better predictors than body fat%. I didn’t find such a study.


Ashwell, M., Gunn, P., & Gibson, S. (2012). Waist‐to‐height ratio is a better screening tool than waist circumference and BMI for adult cardiometabolic risk factors: systematic review and meta‐analysis. obesity reviews, 13(3), 275-286.

Johnson, W., Nijenhuis, J. T., & Bouchard Jr, T. J. (2008). Still just 1 g: Consistent results from five test batteries. Intelligence, 36(1), 81-95.

October 14, 2014

Cancer rates: Part 2, does alcohol consumption have incremental predictive power?

Filed under: Medicine — Tags: , — Emil O. W. Kirkegaard @ 18:50

Same guy proposed another idea. Wikipedia has data here. However, since i had previously seen that people fudge data on Wikipedia articles (e.g. this one), then maybe it was not a good idea to just rely on Wikipedia. So i did the best thing: fetched both the data from Wiki and the data from the primary source (WHO), and then compared them for accuracy. They were 100 identical for the “total rates”. I did not compare the other variables. But at least this dataset was not fudged. :)

So, then i loaded the data in R and plotted alcohol consumption per capita (age >=15) vs. cancer rates per capita.

source(“merger.R”) #load custom functions

DF.mega = read.mega(“Megadataset_v1.7b.csv”) #load megadataset

#load alcohol
alcohol = read.mega(“alcohol_consumption.csv”) #loads the data
short.names = as.abbrev(rownames(alcohol)) #gets the abbreviated names so it can be merged with megadataset
rownames(alcohol) = short.names #inserts the abbreviated names

DF.mega2= merge.datasets(alcohol,DF.mega) #merge datasets

scatterplot(CancerRatePer100000 ~ AlcoholConsumptionPerCapitaWHO, DF.mega2, #plot it
smoother=FALSE, #no moving average
labels = rownames(DF.mega),id.n=nrow(DF.mega)) #include datapoint names


There is no relationship there. However, it may work in multiple regression:

lm1 = lm(CancerRatePer100000 ~ AlcoholConsumptionPerCapitaWHO+X2012LifeExpectancyatBirth,

lm(formula = CancerRatePer100000 ~ AlcoholConsumptionPerCapitaWHO + 
    X2012LifeExpectancyatBirth, data = DF.mega2)

    Min      1Q  Median      3Q     Max 
-48.677 -26.569   0.717  28.486  61.631 

                               Estimate Std. Error t value Pr(>|t|)    
(Intercept)                    -91.6149    79.2375  -1.156    0.254    
AlcoholConsumptionPerCapitaWHO   1.7712     1.6978   1.043    0.303    
X2012LifeExpectancyatBirth       4.2518     0.9571   4.442 6.13e-05 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 31.56 on 43 degrees of freedom
  (227 observations deleted due to missingness)
Multiple R-squared:  0.3158,	Adjusted R-squared:  0.284 
F-statistic: 9.923 on 2 and 43 DF,  p-value: 0.0002861

There is seemingly no predictive power of alcohol consumption! But it does cause cancer, right? According to my skim of Wiki, yes, but only 3.5% of cancer cases, so the effect is too small to be seen here.

The data is in megadataset 1.7c.

Older Posts »

Powered by WordPress