You are currently viewing Self-teaching stats with R in 2020

Self-teaching stats with R in 2020

See 2019 post on introductions to psychology etc. for broader coverage

Anon asks me today:

What would be the best way to quickly self-teach enough statistics to be able to evaluate academic papers? I have the basic concepts down but I’ve never taken an actual stats class.
The glib reply is: l2code.

For reals though. But first, read some very basics:

Once you get that down, you will want to start playing around with data. To properly learn that, you need to learn to code. You can do this with Python or Julia too, but R is better. Why do you need coding? Can’t you just point-and-click software? You can, but you can’t get good that way. Not enough control, too inflexible. Bad analytic patterns to get locked into. So you want to learn R, you want to use this book:

Once you know the basics of R, you can start doing stuff like these:

Alright, so after learning basics of tidyverse R stuff, you are ready to learn more stats:

Most statistics books at this point are going to be very inapplied and equation heavy. You don’t care much about these. Many of them waste time doing by hand stuff like t-tests, chi square. You don’t care too much about these either. (In point of fact, these legacy tests can all be done using regression models.) The only thing one needs to know is that these are NHST tests, and based on some assumptions, produce a p value. The p value is just the probability of the data given noise/nothing is going on/no pattern (so-called null model). If you have enough data, the p value of any pattern in your data will always be very small, and is of no other particular interest. What you really care most about are the effect sizes. This point is hammered home in the Cumming book above.

Going further from here really depends on which way you want to go. If you want meta-analysis skills, you can read any of the excellent introductions to meta-analysis in R. If you fancy Schmidt and Hunter style, there’s:

If you favor regular style:

sf package CRAN page

In general, reading these papers is the wrong way to learn the code in some area. What you want is find a good R ‘vignette’ (that means code and natural language mixed so easy to see what happens). These can generally be found on the package website, sometimes listed on the CRAN page for each package.

You can also start playing around with stuff others have done. Something you care about. One excellent option is browsing Rpubs.com, which has 1000s of public R analysis notebooks, including 100s of my own:

Many of these contain public data, so you can simply download the same data and rerun their code (expect bugs!).

Eventually, when you have done some of the simpler stuff, you will need to read this book. It may take you a month because it isn’t that easy, but it is great and well-worth the time.

They provide boomer-tier R code you can copy and run (mostly for glmnet). You however don’t really want to stick with their way, you want to migrate to tidymodels framework for applied machine learning. For spatial statistics, you want to learn tidy spatial statistics with sf package. For regression modeling, you will want to read through this odd but informative book:

Which has its own package too, rms. The psych package provides a lot of nice functions for psychology related stuff. For latent variable modeling, you want lavaan package. For item response theory, you can begin with psych and migrate to mirt afterwards. Learning how to do stuff in R is mainly a question of finding the right package. Ask someone who has worked on the kind of problem you have what package is good for that.