In a not so well written paper (that we should update to be better presented), John Fuerst and I previously used two hypotheses using the National Longitudinal Survey of Freshman dataset:

  • Spatial transferability: when people move, they tend to keep their psychological traits and culture.
  • Generational transferability: when people have children, their children tend to become similar in psychological traits and culture. In behavioral geneticist terms, this is the sum of genetic and shared environmental effects.
In general, these hypotheses were both supported:
  • Scholastic and language ability tests showed large correlations between migrant performance and country of origin cognitive ability (summary of previous findings):


  • Country of origin cognitive ability predicted migrant scholastic scores (SAT/ACT) and GPA across generations at about r=.350 to .400. These associations increased somewhat when the data were weighted by sqrt(N) to deal with the problem of small samples and the resulting sampling error.
  • Controlling for migrant selectivity by calculating the difference between parental educational attainment and country of origin mean educational attainment increased correlations to about .500 to .540.

So, these are fairly important findings presented in a horrible way. I’m also not happy with the mediation analyses that used NHST statistics and multiple regression instead of path modeling.

Now, due to some Twitter discussion, I recalled this paper and thought that perhaps one can find some more useful datasets. Useful datasets come mostly from large countries because these generally have more migrants which means more countries in the analysis and larger samples sizes which means less sampling error.

I found a dataset that seems pretty useful: Children of Immigrants Longitudinal Study (CILS), 1991-2006. N=5200ish. The documentation is public, and so I downloaded the codebook. It is very long (>500 pages), but skimming the categories and doing some light searching revealed the following variables:

Origin data:
  • Father’s birth country
  • Mother’s birth country
  • Respondent national origin
  • Respondent birth country
  • Respondent US stay length (control variable)

Outcome data:

  • Stanford math achievement total score
  • Stanford reading achievement total score
  • Grade point average
  • Parent SES index
  • Parental occupation prestige/present
  • First job Treiman prestige score
  • Current job Treiman prestige score
  • Total monthly earnings
  • Total family income/last year
  • Age 30 expected occ. Treiman prestige score
  • Respondent weekly income
  • Parent household monthly earnings
  • Parent family total income/past year

So, they can broadly be classified into respondent and parental traits/outcomes. Some of them are functions of others (SES index is their S factor). There are lots of variables associated with English ability, but they show strong ceiling effects and are probably not very useful for analysis.

Did someone already look at this?