One of the best things about R is that there is a never-ending stream of new short textbooks being written. Most of these are now written using the neat bookdown approach, meaning that the entire book is generated from markdown code. In fact, one can go to the website of this project and get a listing of all (?) such books:
OK granting, a lot of these are random lecture notes. But still, there’s a lot of good stuff. Too bad they don’t have a proper search interface, or some kind of way to rank the listing by quality. Few people are interested in some random Chinese lecture notes.
What prompted me to write this advertisement blogpost was that I just finished reading one of these books, namely the new Tidymodels introduction:
It’s a really nice hands-on book. Suppose you don’t have a particular background in statistics, but you know how to code a bit. What do you need to get a nice machine learning model or 10 going? This book is for you. The chapters are:
- Introduction
- 1 Software for modeling
- 2 A Tidyverse Primer
- 3 A Review of R Modeling Fundamentals
- Modeling Basics
- 4 The Ames Housing Data
- 5 Spending our Data
- 6 Fitting Models with parsnip
- 7 A Model Workflow
- 8 Feature Engineering with recipes
- 9 Judging Model Effectiveness
- Tools for Creating Effective Models
- 10 Resampling for Evaluating Performance
- 11 Comparing Models with Resampling
- 12 Model Tuning and the Dangers of Overfitting
- 13 Grid Search
- 14 Iterative Search
- 15 Screening Many Models
- Beyond the Basics
- 16 Dimensionality Reduction
- 17 Encoding Categorical Data
- 18 Explaining Models and Predictions
- 19 When Should You Trust Your Predictions?
- 20 Ensembles of Models
- 21 Inferential Analysis
I like everything except that there was no proper conclusion chapter — odd! — and the last chapter on inferential statistics is weak. Fortunately, inferential statistics is pretty much everything else in statistics people learn, so we don’t need to spend much time on that in a book on applied machine learning.
For those looking for more, here’s some other books in the same format I enjoyed:
- Feature Engineering and Selection: A Practical Approach for Predictive Models, by Max Kuhn and Kjell Johnson, 2019
- Welcome to Text Mining with, by Julia Silge and David Robinson, 2017
- Forecasting: Principles & Practice, by Rob J Hyndman and George Athanasopoulos, 2018
- Advanced R, by Hadley Wickham, 2022 (2nd ed.)
- R for Data Science, by Hadley Wickham, 2022 (2nd ed.)
- R Packages: Organize, Test, Document, and Share Your Code, Hadley Wickham, 2022 (2nd ed.)
And here’s some ones that look promising I haven’t read yet:
- Statistical Inference via Data Science: A ModernDive into R and the Tidyverse, by Chester Ismay and Albert Y. Kim, 2022
- Supervised Machine Learning for Text Analysis in R, by Emil Hvitfeldt and Julia Silge, 2022
- Mastering Shiny, by Hadley Wickham, 2020
- Geocomputation with R, by Robin Lovelace, Jakub Nowosad, Jannes Muenchow, 2019