R for machine learning and other stuff

One of the best things about R is that there is a never-ending stream of new short textbooks being written. Most of these are now written using the neat bookdown approach, meaning that the entire book is generated from markdown code. In fact, one can go to the website of this project and get a listing of all (?) such books:

OK granting, a lot of these are random lecture notes. But still, there’s a lot of good stuff. Too bad they don’t have a proper search interface, or some kind of way to rank the listing by quality. Few people are interested in some random Chinese lecture notes.

What prompted me to write this advertisement blogpost was that I just finished reading one of these books, namely the new Tidymodels introduction:

Tidy Modeling with R by Max Kuhn and Julia Silge, Version 1.0.0 (2022-07-22)

It’s a really nice hands-on book. Suppose you don’t have a particular background in statistics, but you know how to code a bit. What do you need to get a nice machine learning model or 10 going? This book is for you. The chapters are:

Introduction
- 1 Software for modeling
- 2 A Tidyverse Primer
- 3 A Review of R Modeling Fundamentals
Modeling Basics
- 4 The Ames Housing Data
- 5 Spending our Data
- 6 Fitting Models with parsnip
- 7 A Model Workflow
- 8 Feature Engineering with recipes
- 9 Judging Model Effectiveness
Tools for Creating Effective Models
- 10 Resampling for Evaluating Performance
- 11 Comparing Models with Resampling
- 12 Model Tuning and the Dangers of Overfitting
- 13 Grid Search
- 14 Iterative Search
- 15 Screening Many Models
Beyond the Basics
- 16 Dimensionality Reduction
- 17 Encoding Categorical Data
- 18 Explaining Models and Predictions
- 19 When Should You Trust Your Predictions?
- 20 Ensembles of Models
- 21 Inferential Analysis

I like everything except that there was no proper conclusion chapter — odd! — and the last chapter on inferential statistics is weak. Fortunately, inferential statistics is pretty much everything else in statistics people learn, so we don’t need to spend much time on that in a book on applied machine learning.

For those looking for more, here’s some other books in the same format I enjoyed:

And here’s some ones that look promising I haven’t read yet:

You Might Also Like

Variance explained is mostly bad

Is the summed cubes equal to the squared sum of counting integer series?

Interactive statistics: breeding’s equation AKA regression towards the mean