Cognitive dysgenics in the OKCupid dataset: a few simple analyses

OKCupid dataset (not public right now, contact me if you want the password). Draft paper: https://osf.io/p9ixw/ I looked at whether there was evidence for cognitive dysgenics in the OKCupid dataset. The unrepresentativeness of the dataset is not much of a problem here: indeed we are very much interested in younger people looking to date since…

Continue Reading

Change in Egyptian skull sizes: -4000 to 150

I often read statistics textbooks. In textbooks, they often use example datasets, some of which are interesting in themselves (e.g. the Boston dataset). In this case, I am reading An Introduction to Applied Multivariate Analysis with R. It features a dataset of Egyptian skulls spanning about 4000 years. Given the scholarly interest in dysgenics and…

Continue Reading

R functions for analyzing missing data

I’m reading Missing Data: A Gentle Introduction and it mentions various methods to understand how data are missing in a given dataset. The book, however, is light on actual tools. So, since I have already implemented a few functions in my package for handling missing data, I decided to implement a few more. These have…

Continue Reading

Estimation of piecewise linear functions

Suppose you have some dataset where you know or suspect that the real generating function is actually a piecewise function with k pieces each of which is a standard linear model. How does you find these? This is the problem presented to me from a friend. I came up with this method: Find all the…

Continue Reading

A more intuitive explanation of the correlation

Someone asks on Reddit: Can someone intuitively explain the correlation formula? I know what the Cov(X,Y) means. It tells you if the relationship between the variables X and Y is positive or negative (although I must admit I dont really know what the actual number means, I only look the the sign). I know what…

Continue Reading