Comments on Gwern’s “Embryo selection for intelligence”


Embryo selection an add-on to IVF [his summary]:

  1. harvest x eggs
  2. fertilize them and create x embryos
  3. culture the embryos to either cleavage (2-4 days) or blastocyst (5-6 days) stage; of them, y will still be alive & not grossly abnormal
  4. freeze the embryos
  5. optional: embryo selection using quality and PGS
  6. unfreeze & implant 1 embryo; if no embryos left, return to #1 or give up
  7. if no live birth, go to #6

Gwern asks the question: suppose the technology is ready for this, would the procedure be cost efficient given estimates about the value of cognitive ability? To estimate this is surprisingly difficult due to the lack of easily available information about the cost of IVF, freezing/thawing embryos and so on. Still, one can make some estimates. Gwern finds that currently it is probably not worth it in economic terms (purely in terms of cost benefit of economic gains).

GCTA/GREML results for cognitive ability

To estimate the likely IQ gains from ES, one must know the narrow heritability (h2n) and the predictive validity of genomic data for IQ. Estimates of h2n come from GCTA/GREML (GCTA is the software) papers. Gwern has done the first meta-analysis of such papers, as far as I know. Disturbingly, he finds that the papers using older subjects found lower, not higher h2. This is of course in contrast to results from familial studies (e.g. this paper).
Worse, from eyeballing his forest plot, it seems that the larger studies found smaller h2ns, indicating a small study effect. Often this means publication bias. Perhaps there are more GREML studies out there that found negligible h2n for cognitive ability but were not published.
Because Gwern shared his data and code, I quickly checked whether there was some evidence for publication bias. Here’s the forest plot sorted for effect size. (The plot is ugly because it uses base graphics. Someone did try to make a ggplot2 version, but it’s not too good yet.)
We can see that the less precise studies tend to find smaller effects. The correlation is .35 [CI95: -.24 to .76]. There is some between study variance I2 = 31%, so there are likely some moderators. In the moderator analysis, I included standard error, publication year, mean age, and twin status (did the study use twins or not?). Unfortunately, the output is given in natural units, not standardized units:
Mixed-Effects Model (k = 13; tau^2 estimator: REML)

tau^2 (estimated amount of residual heterogeneity):     0 (SE = 0.0026)
tau (square root of estimated tau^2 value):             0
I^2 (residual heterogeneity / unaccounted variability): 0.00%
H^2 (unaccounted variability / sampling variability):   1.00
R^2 (amount of heterogeneity accounted for):            100.00%

Test for Residual Heterogeneity: 
QE(df = 8) = 3.5513, p-val = 0.8952

Test of Moderators (coefficient(s) 2,3,4,5): 
QM(df = 4) = 10.4646, p-val = 0.0333

Model Results:
           estimate       se     zval    pval    ci.ub    
intrcpt    -50.2260  53.6941  -0.9354  0.3496  -155.4644  55.0124    
SE           0.6149   0.8020   0.7666  0.4433    -0.9571   2.1868    
Age.mean    -0.0036   0.0013  -2.8209  0.0048    -0.0061  -0.0011  **
Twin TRUE   -0.1524   0.0893  -1.7060  0.0880    -0.3274   0.0227   .
pub_year     0.0252   0.0267   0.9432  0.3456    -0.0271   0.0774    

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Of the attempted moderators, age was the most useful. The correlation is in fact -.54 [CI95: -.84 to .02], so maybe. I included the publication year to check for decline effects. Still, when age was in the model, there was not much evidence that standard error had any effect, i.e. no good evidence for publication bias. The funnel plot looks like this:


It seems slightly asymmetric, but it could be a fluke. We will have to wait for more studies to see.

Genomic predictive validity in the near future for cognitive ability

Gwern digs up studies that report the R2 (variance explained) values for predicting case-level outcomes. Usually, studies fail in this regard as they only report the R2 value of the hits (that is, SNPs with p<alpha). Instead, they should report the R2 of the polygenic scores using all the SNPs (or some large fraction of them). There is considerable validity of the SNPs whose p<alpha. Note however, that one must be careful with overfitting, so preferably a validation sample should be used (or use standard within cross-validation methods).

Gwern finds that the values in the published studies are quite low, e.g. 2.5% in Rietveld et al (2013). I am more optimistic about the expected R2 values in the near future. This is because the standard GWASes do not use imputed data. However, using that drastically increases the h2n estimates. See the recent paper for height. I expect similar findings to happen for CA as well. As far as I know, there is nothing keeping researchers from using imputed data.

Also note that it is not R2 itself that matters in practice, but the R value (beta). To get half the predictive validity, one need only a forth of the variance explained; sqrt(.25) = .5.

Another problem is that standard GWASes use a poor method of finding hits. Genomic data is sparse (most predictors have betas of 0), so one should use sparse methods, i.e. lasso regression. Steve Hsu has written about it here: The reason lasso regression is not used currently is that it requires case-level data to be shared. Researchers don’t currently share case-level data, only summary data. Once again, science is held back by scientists’ (and funders’) unscientific behavior (data hiding).

Future prospects?

In general, tho, I agree with Gwern. Embryo selection for cognitive ability is currently not worth it. However, I expect (90%) that it will be in a few years (<5 years).

R code

The rewritten analysis code for meta-analysis of GCTA/GREML studies.

# libs --------------------------------------------------------------------
p_load(metafor, stringr, plyr, psych, kirkegaard)

# data --------------------------------------------------------------------
d = read.csv("data/GCTA_CA.csv")

#publication year
d$pub_year = str_extract(d$Study, "\\d+") %>% as.numeric()

d_std = std_df(d, exclude = "HSNP")

# analyses ----------------------------------------------------------------
d <- d[order(d$Age.mean), ]
d <- d[order(d$HSNP), ]

rem <- rma(yi=HSNP, sei=SE, data=d); rem

print(corr.test(cbind(d$HSNP, d$SE)), short = F)
print(corr.test(cbind(d$HSNP, d$Age.mean)), short = F)

remAge <- rma(yi=HSNP, sei=SE, mods = ~ Age.mean, data=d); remAge
remAgeT <- rma(yi=HSNP, sei=SE, mods = ~ Age.mean + Twin, data=d); remAgeT
remES <- rma(yi=HSNP, sei=SE, mods = ~ SE, data=d); remES
remAll <- rma(yi=HSNP, sei=SE, mods = ~ SE + Age.mean + Twin + pub_year, data=d); remAll

forest(rma(yi=HSNP, sei=SE, data=d), slab=d$Study)
# GG_forest(rem) + xlim(c(0, 1))
funnel(rma(yi=HSNP, sei=SE, data=d), slab=d$Study)

The datafile:

"Deary et al 2012",0.48,0.18,11," FALSE"
"Deary et al 2012",0.28,0.18,71.3," FALSE"
"Plomin et al 2013",0.35,0.117,12," TRUE"
"Benyamin et al 2013",0.22,0.1,12," TRUE"
"Benyamin et al 2013",0.4,0.21,14," TRUE"
"Benyamin et al 2013",0.46,0.06,9," FALSE"
"Rietveld et al 2013",0.224,0.042,57.47," FALSE"
"Marioni et al 2014",0.29,0.05,57," FALSE"
"Kirkpatrick et al 2014",0.35,0.11,14.63," FALSE"
"Trzaskowski et al 2014",0.26,0.17,7," TRUE"
"Trzaskowski et al 2014",0.45,0.14,12," TRUE"
"Davies et al 2015",0.29,0.05,57.2," FALSE"
"Davies et al 2015",0.28,0.07,70," FALSE"