In the spirit of reproducible science, this is a post about an error I fixed in a function that affects all prior analyses with that. When factor analyzing data, the goal is to reveal a latent structure in the dataset. Given various assumptions, factor analysis will find a structure if there is one. It is […]

OKCupid dataset (not public right now, contact me if you want the password). Draft paper: osf.io/p9ixw/ I looked at whether there was evidence for cognitive dysgenics in the OKCupid dataset. The unrepresentativeness of the dataset is not much of a problem here: indeed we are very much interested in younger people looking to date since […]

I often read statistics textbooks. In textbooks, they often use example datasets, some of which are interesting in themselves (e.g. the Boston dataset). In this case, I am reading An Introduction to Applied Multivariate Analysis with R. It features a dataset of Egyptian skulls spanning about 4000 years. Given the scholarly interest in dysgenics and […]

I’m reading Missing Data: A Gentle Introduction and it mentions various methods to understand how data are missing in a given dataset. The book, however, is light on actual tools. So, since I have already implemented a few functions in my package for handling missing data, I decided to implement a few more. These have […]

Suppose you have some dataset where you know or suspect that the real generating function is actually a piecewise function with k pieces each of which is a standard linear model. How does you find these? This is the problem presented to me from a friend. I came up with this method: Find all the […]

Someone asks on Reddit: Can someone intuitively explain the correlation formula? I know what the Cov(X,Y) means. It tells you if the relationship between the variables X and Y is positive or negative (although I must admit I dont really know what the actual number means, I only look the the sign). I know what […]

Some people claim that the climate has direct causal influences on income, cognitive ability and so on. Usually, these academics just regress IQ on climate variables at the country or US state-level. However, it is possible to do it at the US county-level too. Unfortunately, it is difficult to find climate data by the county. […]

In the interest of publishing null findings: I tried estimating US state IQs from the mean cognitive ability for users in the OKCupid dataset. However, this did not work out. This was a far shot to begin with due to massive self-selection and somewhat non-random sampling. Actually, what I really wanted was another way to […]

I am doing an S factor study of US counties in the usual way. For that reason, I need some kind of county-level cognitive ability estimate. I know that this is possible to create using the Add Health database, but that the data are not sharable. However, it may be possible to do some tricks, […]

This is a post in the on-going series about stuff in my package: kirkegaard [I’m not egocentric but since there is no central theme about the functions in the package other than I made and use them, there is nothing else to call it.] I figure it should be easy to find someone who wrote […]