Cognitive dysgenics in the OKCupid dataset: a few simple analyses

OKCupid dataset (not public right now, contact me if you want the password). Draft paper:

I looked at whether there was evidence for cognitive dysgenics in the OKCupid dataset. The unrepresentativeness of the dataset is not much of a problem here: indeed we are very much interested in younger people looking to date since these are the parents of the next generation and some are already parents of course.

The cognitive test is made of an ad hoc collection of 14 items. It seems to work fairly well. The items were scored using item response theory.

There are a couple of questions related to fertility, either desired or actual. The relationships to cognitive ability looks like this:

q105 q979 q80041

These are the raw relationships. One may wonder if they are confounded with, e.g. race or age. So one could try a multiple regression. Unfortunately, since the outcomes are ordinal data, standard multiple regression results in a large downward bias. Still, results look like this:

                        Beta   SE CI.lower CI.upper
CA                     -0.07 0.01    -0.08    -0.06
d_age                  -0.10 0.01    -0.11    -0.09
gender: Other          -0.69 0.09    -0.86    -0.52
gender: Woman           0.01 0.01    -0.02     0.03
race: Mixed             0.13 0.02     0.09     0.17
race: Asian             0.16 0.03     0.11     0.22
race: Hispanic / Latin  0.12 0.03     0.07     0.18
race: Black             0.29 0.03     0.23     0.34
race: Other             0.06 0.03    -0.01     0.12
race: Indian            0.07 0.06    -0.05     0.18
race: Middle Eastern    0.11 0.08    -0.06     0.27
race: Native American   0.21 0.12    -0.03     0.45
race: Pacific Islander  0.40 0.12     0.16     0.64

       N       R2  R2 adj. 
32537.00     0.03     0.03

The dependent variable was the question above about the number of desired children. Still, we see a small negative coefficient as expected. The size is similar to other studies. Since the cognitive measure was not optimal, it is probably somewhat larger.

Analysis code is in the OSF repository in the adhoc.R file.