Installing R on Ubuntu 20.04: still too damn difficult

Here’s a whiny post on annoying R features. I love R and all, but it’s obnoxious to install. Not only does it take a long time to compile so many packages, this step typically has to be repeated a few times due to unforeseeable errors. These errors are typically also hard to find, one has…

Continue Reading

Trying out tidymodels package

If you want to train machine learning models, R offers thousands of packages to try out. However, most of these are written by random students and academics, so they are not proper standardized. It can looks like this: So, it’s a nightmare trying out a lot of models or (learning) algorithms to predict a specific…

Continue Reading

Regression Modeling Strategies (2nd ed.) – Frank Harrell (review)

https://www.goodreads.com/book/show/10753824-regression-modeling-strategies I heard some good things about this book, and some of it is good. Surely, the general approach outlined in the introduction is pretty sound. He sets up the following principles: Satisfaction of model assumptions improves precision and increases statistical power. It is more productive to make a model fit step by step (e.g.,…

Continue Reading

Do music genres exist? An outline of an empirical approach

I don’t have time to do this project right now. However, since I gathered a bunch of relevant resources that I don’t want to have to re-find, I will write them down somewhere — here. My blog is my extended memory. Music genres are another fuzzy concept, like races. Lots of genres (and classification systems…

Continue Reading

Renaming functions in R packages using roxygen2

Naming things in programming is hard. I can see 2 reasons for this. 1) It is hard to pick a name you won’t want to change later. With large projects, where things are not designed in detail before beginning to write the code, I often find that the names I gave things initially were not…

Continue Reading

SQL server for population frequencies from 1000 genomes

Note: 2018 June 26 Server down right now, investigating. Note: August 16, 2017 server IP changed to 67.207.92.10. Original post We need dplyr for this: library(dplyr) First, use the anon user to log into the SQL server (user = “anon”, pass = “”, ip = “67.207.92.10”, port = 3306): sql = src_mysql(“population_freqs”, host = “67.207.92.10”,…

Continue Reading