There are some sites that have scraped the average movie ratings at various sites such as IMDB, Rotten Tomatos and Allmovies. Here is one particularly good (and still online) job. Still, this is all aggregate data. Perhaps we are more interesting in how people rate movies. One does have the option to make one’s profile […]

Noah Carl has been investing the relationship between cognitive ability and political opinions aside from the usual confused 1-axis left-right model. Specifically, looking at the economic freedom and personal freedom axes (á la this test). He did this in two datasets so far, covering the UK and the US: Verbal intelligence is correlated with socially […]

Normally, testing colorism or other causal models of why human racial traits have nonzero relationships to socioeconomic outcomes requires that one has the following data: Measure of racial ancestry Measures of racial appearance Measures of socioeconomic outcomes such as income or educational attainment Path model wise, one can think of it this way: Discrimination models […]

OSF has now suspended the entire repository, not just deleted the user datafile. Not sure why this is the case. The paper (PDF) is available here. Edited to add: The repository is closed due to a DMCA request sent by OKCupid which is currently being investigated. Edited to add2: For those wondering what information was in […]

In a recent paper, Beaver et al looked at the relationships between crime, gender and sexual orientation: This study examined the association between sexual orientation and nonviolent and violent delinquency across the life course. We analyzed self-reported nonviolent and violent delinquency in a sample of heterosexual males (N=5220–7023) and females (N=5984–7875), bisexuals (N=34–73),gay males (N=145–189), […]

Data from the OKCupid project. In light of a recent paper examining who prefers to date within their own religion, I recalled that there was a question about this in the OKCupid dataset, except that it is for race: “Would you strongly prefer to go out with someone of your own skin color / racial background?” […]

In the spirit of reproducible science, this is a post about an error I fixed in a function that affects all prior analyses with that. When factor analyzing data, the goal is to reveal a latent structure in the dataset. Given various assumptions, factor analysis will find a structure if there is one. It is […]

OKCupid dataset (not public right now, contact me if you want the password). Draft paper: osf.io/p9ixw/ I looked at whether there was evidence for cognitive dysgenics in the OKCupid dataset. The unrepresentativeness of the dataset is not much of a problem here: indeed we are very much interested in younger people looking to date since […]

I often read statistics textbooks. In textbooks, they often use example datasets, some of which are interesting in themselves (e.g. the Boston dataset). In this case, I am reading An Introduction to Applied Multivariate Analysis with R. It features a dataset of Egyptian skulls spanning about 4000 years. Given the scholarly interest in dysgenics and […]

I’m reading Missing Data: A Gentle Introduction and it mentions various methods to understand how data are missing in a given dataset. The book, however, is light on actual tools. So, since I have already implemented a few functions in my package for handling missing data, I decided to implement a few more. These have […]