Normally, testing colorism or other causal models of why human racial traits have nonzero relationships to socioeconomic outcomes requires that one has the following data: Measure of racial ancestry Measures of racial appearance Measures of socioeconomic outcomes such as income or educational attainment Path model wise, one can think of it this way: Discrimination models […]

In a recent paper, Beaver et al looked at the relationships between crime, gender and sexual orientation: This study examined the association between sexual orientation and nonviolent and violent delinquency across the life course. We analyzed self-reported nonviolent and violent delinquency in a sample of heterosexual males (N=5220–7023) and females (N=5984–7875), bisexuals (N=34–73),gay males (N=145–189), […]

Data from the OKCupid project. In light of a recent paper examining who prefers to date within their own religion, I recalled that there was a question about this in the OKCupid dataset, except that it is for race: “Would you strongly prefer to go out with someone of your own skin color / racial background?” […]

OKCupid dataset (not public right now, contact me if you want the password). Draft paper: osf.io/p9ixw/ I looked at whether there was evidence for cognitive dysgenics in the OKCupid dataset. The unrepresentativeness of the dataset is not much of a problem here: indeed we are very much interested in younger people looking to date since […]

In the interest of publishing null findings: I tried estimating US state IQs from the mean cognitive ability for users in the OKCupid dataset. However, this did not work out. This was a far shot to begin with due to massive self-selection and somewhat non-random sampling. Actually, what I really wanted was another way to […]

I am doing an S factor study of US counties in the usual way. For that reason, I need some kind of county-level cognitive ability estimate. I know that this is possible to create using the Add Health database, but that the data are not sharable. However, it may be possible to do some tricks, […]

I forgot to post this blog post at the time of publication as I usually do. However, here it is. As explored in some previous posts, John Fuerst and I have spent about 1.25 years (!) producing a massive article: published version runs 119 pages; 25k words without the references; 159k characters incl. spaces. We […]

Non-Shared Environment Doesn’t Just Mean Schools And Peers Scott Alexander has a new post out summarizing the interpretations of what constitutes non-shared environment in the ACE model estimates from standard (MZ-DZ) twin studies. I’m happy that he wrote that up because I had been thinking of writing up the same points, but he is a […]

Medical researchers have noticed that some diseases differ by SIRE (self-identified race/ethnicity) groups which differ by genomic (racial) ancestry. Hence, when genomic measures became available (last 15 years or so), they measured peoples relative proportions of ancestry in mixed populations to see if the diseases would be predictable by ancestry. They were. This establishes with […]

Comment on: infoproc.blogspot.dk/2016/02/missing-heritability-and-gcta-update-on.html First, skim this paper: journals.plos.org/ploso… Genomic prediction works fairly well. This recent paper does a cross-data cross-method analysis of genomic prediction methods using 10 fold cross-validation to account for overfitting. In general, the compressed sensing/lasso/regularization methods perform well, but surprisingly an even simpler method comes out on top (mRMR): Maximum Relevancy Minimum […]