In the spirit of reproducible science, this is a post about an error I fixed in a function that affects all prior analyses with that. When factor analyzing data, the goal is to reveal a latent structure in the dataset. Given various assumptions, factor analysis will find a structure if there is one. It is […]

I’m reading Missing Data: A Gentle Introduction and it mentions various methods to understand how data are missing in a given dataset. The book, however, is light on actual tools. So, since I have already implemented a few functions in my package for handling missing data, I decided to implement a few more. These have […]

Suppose you have some dataset where you know or suspect that the real generating function is actually a piecewise function with k pieces each of which is a standard linear model. How does you find these? This is the problem presented to me from a friend. I came up with this method: Find all the […]

Someone asks on Reddit: Can someone intuitively explain the correlation formula? I know what the Cov(X,Y) means. It tells you if the relationship between the variables X and Y is positive or negative (although I must admit I dont really know what the actual number means, I only look the the sign). I know what […]

When one has a continuous variable and then cuts it into bins (discretization) and correlates it with some other variable, the observed correlation is biased downwards to some extent. The extent depends on the cut-off value(s) used and the number of bins. Normally, one would use polychoric-type methods (better called latent correlation estimation methods) to […]

Disclaimer: Some not too structured thoughts. It’s commonly said that correlation does not imply causation. That is true (see Gwern’s analysis), but does causation imply correlation? Specifically, if “→” means causes and “~~” means correlates with, does X→Y imply X~~Y? It may seem obvious that the answer is yes, but it is not so clear. […]

Title says it all. Apparently, some think this is not the case, but it is a straightforward application of Bayes’ theorem. When I first learned of Bayes’ theorem years ago, I thought of this point. Back then I also believed that stereotypes are irrelevant when one has individualized information. Alas, it is incorrect. Neven Sesardic […]

In reviewing our upcoming target article in Mankind Quarterly (edit: now published), Gerhard Meisenberg wrote: “One possibility that you don’t seem to discuss is that there are true and large correlations of the Euro% variable and the geographic variables, but that the geographic variables are measured much more precisely than the Euro% variable. In that […]

This may have some interest. Basically, typologists cannot into statistics and it shows. On the other hand, it means there is a large number of low hanging fruit for someone with skills in statistical programming. PDF on OSF: This was handed in as a paper for typology class. Quite likely the last class I […] There are few good things to say about this book. The best thing is on the very first page: Some things never change: the sun rises in the morning. The sun sets in the evening. A lot of linguists don’t like statistics. I have spent a decade teaching English language and linguistics at […]