I wrote about this before, but since this is a frequent problem and my last post wasn’t brief, here’s a shorter version. The primary way to install software in Linux is to rely on apt-get (apt in Mint) or some other package manager. The way this works is that there is a central server which […]

Installing R packages on Windows is easy: you run the install code and it always works. Not so on Linux! Here one sometimes has to install them thru apt-get (or whatever package manager) or install some missing system-level dependencies. Finding what to do can take a lot of time gooling and trial and error. So, […]

I’m reading Missing Data: A Gentle Introduction and it mentions various methods to understand how data are missing in a given dataset. The book, however, is light on actual tools. So, since I have already implemented a few functions in my package for handling missing data, I decided to implement a few more. These have […]

This is a post in the on-going series about stuff in my package: kirkegaard [I’m not egocentric but since there is no central theme about the functions in the package other than I made and use them, there is nothing else to call it.] I figure it should be easy to find someone who wrote […]

Quick recap of the main object types in R from Advanced R: homogeneous heterogeneous 1-d atomic vector list 2-d matrix data.frame n-d array ??? So, objects can either store only data of the same type or of any type, and they can have 1, 2 or any number of dimensions. Note that there is a […]

On-going series of posts about functions in my R package (github.com/Deleetdk/kirkegaard ). Suppose you have a list or a simple vector (lists are vectors) with some data. However, some of it is missing or bad in various ways: NA, NULL, NaN, Inf (or -Inf). Usually, we want to get rid of these datapoints, but it […]

There is a question on SO about this: stackoverflow.com/questions/4752275/test-for-equality-among-all-elements-of-a-single-vector But I was a bit more curious, so! #test data, large vectors v1 = rep(1234, 1e6) v2 = runif(1e6) #functions to try all_the_same1 = function(x) {   range(x) == 0 } all_the_same2 = function(x) {   max(x) == min(x) } all_the_same3 = function(x) {   sd(x) […]

Recently, I wrote a function called copy_names(). It does what you think and a little more: it copies names from one object to another. But it can also attempt to do so even when the sizes of the objects’ dimensions do not match up perfectly. For instance: > t = matrix(1:9, nrow=3) > t2 = […]

Usually working with large public datasets requires that one recode variables. This can be quite repetitive. When variables only have a few possible values, one can use something like plyr‘s mapvalues() for great benefit (see my answer at SO). However, when there is an indefinite number of different values, it is not useful. What one […]

Chisala has his 3rd installment up: www.unz.com/article/closing-the-black-white-iq-gap-debate-part-3/ One idea I had while reading it was that tail effects interact with population ethnic/racial heterogeneity. To show this, I did a simulation experiment. Population 1 is a regular population with a mean of 0 and sd of 1. Population 2 is a composite population of three sub-populations: […]