{"id":8852,"date":"2020-06-26T20:57:22","date_gmt":"2020-06-26T19:57:22","guid":{"rendered":"https:\/\/emilkirkegaard.dk\/en\/?p=8852"},"modified":"2020-06-26T21:17:08","modified_gmt":"2020-06-26T20:17:08","slug":"self-teaching-stats-in-2020","status":"publish","type":"post","link":"https:\/\/emilkirkegaard.dk\/en\/2020\/06\/self-teaching-stats-in-2020\/","title":{"rendered":"Self-teaching stats with R in 2020"},"content":{"rendered":"<p><a href=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/46g2yr.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright wp-image-8853 size-medium\" src=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/46g2yr-228x300.jpg\" alt=\"\" width=\"228\" height=\"300\" srcset=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/46g2yr-228x300.jpg 228w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/46g2yr.jpg 500w\" sizes=\"auto, (max-width: 228px) 100vw, 228px\" \/><\/a><\/p>\n<p><a href=\"https:\/\/emilkirkegaard.dk\/en\/?p=7793\"><em>See 2019 post on introductions to psychology etc. for broader coverage<\/em><\/a><\/p>\n<p>Anon asks me today:<\/p>\n<blockquote>\n<div class=\"module-message__text module-message__text--incoming\" dir=\"auto\">What would be the best way to quickly self-teach enough statistics to be able to evaluate academic papers? I have the basic concepts down but I&#8217;ve never taken an actual stats\u00a0class.<\/div>\n<\/blockquote>\n<div dir=\"auto\">The glib reply is: l2code.<\/div>\n<p dir=\"auto\">For reals though. But first, read some very basics:<\/p>\n<ul>\n<li>\n<div class=\"gs_citr\" tabindex=\"0\">Spiegelhalter, D. (2019). <a href=\"https:\/\/www.goodreads.com\/book\/show\/42643892-the-art-of-statistics\"><i>The art of statistics: learning from data<\/i><\/a>. Penguin UK.<\/div>\n<\/li>\n<li>\n<div class=\"gs_citr\" tabindex=\"0\">Chambers, C. (2019). <a href=\"https:\/\/www.goodreads.com\/book\/show\/32025411-the-seven-deadly-sins-of-psychology\"><i>The seven deadly sins of psychology: A manifesto for reforming the culture of scientific practice<\/i><\/a>. Princeton University Press.<\/div>\n<\/li>\n<\/ul>\n<p>Once you get that down, you will want to start playing around with data. To properly learn that, you need to learn to code. You can do this with Python or Julia too, but R is better. Why do you need coding? Can&#8217;t you just point-and-click software? You can, but you can&#8217;t get good that way. Not enough control, too inflexible. Bad analytic patterns to get locked into. So you want to learn R, you want to use this book:<\/p>\n<ul>\n<li>\n<div class=\"gs_citr\" tabindex=\"0\">Wickham, H., &amp; Grolemund, G. (2016). <a href=\"http:\/\/rr4ds.had.co.nz\/\"><i>R for data science: import, tidy, transform, visualize, and model data<\/i>.<\/a><\/div>\n<\/li>\n<\/ul>\n<p>Once you know the basics of R, you can start doing stuff like these:<\/p>\n<ul>\n<li><a href=\"http:\/\/emilkirkegaard.dk\/understanding_statistics\/\">http:\/\/emilkirkegaard.dk\/understanding_statistics\/<\/a><\/li>\n<li>If you want to learn how to make these, you need to read e.g. <a href=\"https:\/\/shiny.rstudio.com\/tutorial\/\">this tutorial<\/a>, the package is called <strong>shiny<\/strong>.<\/li>\n<\/ul>\n<p>Alright, so after learning basics of <strong>tidyverse<\/strong> R stuff, you are ready to learn more stats:<\/p>\n<ul>\n<li>Cumming, G. (2013). <a href=\"https:\/\/www.goodreads.com\/book\/show\/10765705-understanding-the-new-statistics\">Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis.<\/a> Routledge.\n<ul>\n<li>You won&#8217;t be using the author&#8217;s Excel code stuff, you will re-do the parts in R you care about.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Most statistics books at this point are going to be very inapplied and equation heavy. You don&#8217;t care much about these. Many of them waste time doing by hand stuff like t-tests, chi square. You don&#8217;t care too much about these either. (<a href=\"https:\/\/lindeloev.github.io\/tests-as-linear\/\">In point of fact, these legacy tests can all be done using regression models<\/a>.) The only thing one needs to know is that these are NHST tests, and based on some assumptions, produce a p value. The p value is just the probability of the data given noise\/nothing is going on\/no pattern (so-called null model). If you have enough data, the p value of any pattern in your data will always be very small, and is of no other particular interest. What you really care most about are the effect sizes. This point is hammered home in the Cumming book above.<\/p>\n<p>Going further from here really depends on which way you want to go. If you want meta-analysis skills, you can read any of the excellent introductions to meta-analysis in R. If you fancy Schmidt and Hunter style, there&#8217;s:<\/p>\n<ul>\n<li>\n<div class=\"gs_citr\" tabindex=\"0\">Dahlke, J. A., &amp; Wiernik, B. M. (2019). <a href=\"https:\/\/journals.sagepub.com\/doi\/abs\/10.1177\/0146621618795933\">psychmeta: An R package for psychometric meta-analysis<\/a>. <i>Applied Psychological Measurement<\/i>, <i>43<\/i>(5), 415-416.<\/div>\n<\/li>\n<li><a href=\"https:\/\/psychmeta.com\/\">https:\/\/psychmeta.com\/<\/a><\/li>\n<\/ul>\n<p>If you favor regular style:<\/p>\n<ul>\n<li>\n<div class=\"gs_citr\" tabindex=\"0\">Viechtbauer, W. (2010). <a href=\"https:\/\/lirias.kuleuven.be\/1059637?limo=0\">Conducting meta-analyses in R with the metafor package<\/a>. <i>Journal of statistical software<\/i>, <i>36<\/i>(3), 1-48.<\/div>\n<\/li>\n<li><a href=\"http:\/\/www.metafor-project.org\/doku.php\">http:\/\/www.metafor-project.org\/doku.php<\/a><\/li>\n<\/ul>\n<figure id=\"attachment_8857\" aria-describedby=\"caption-attachment-8857\" style=\"width: 300px\" class=\"wp-caption alignright\"><a href=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2020-06-26-22-13-46.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-8857 size-medium\" src=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2020-06-26-22-13-46-300x120.png\" alt=\"\" width=\"300\" height=\"120\" srcset=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2020-06-26-22-13-46-300x120.png 300w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2020-06-26-22-13-46-768x308.png 768w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/Screenshot-from-2020-06-26-22-13-46.png 919w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-8857\" class=\"wp-caption-text\"><a href=\"https:\/\/cran.r-project.org\/web\/packages\/sf\/index.html\"><strong>sf<\/strong> package CRAN page<\/a><\/figcaption><\/figure>\n<p>In general, reading these papers is the wrong way to learn the code in some area. What you want is find a good R &#8216;vignette&#8217; (that means code and natural language mixed so easy to see what happens). These can generally be found on the package website, sometimes listed on the CRAN page for each package.<\/p>\n<p>You can also start playing around with stuff others have done. Something you care about. One excellent option is browsing <a href=\"http:\/\/rpubs.com\">Rpubs.com<\/a>, which has 1000s of public R analysis notebooks, including 100s of my own:<\/p>\n<ul>\n<li><a href=\"https:\/\/rpubs.com\/EmilOWK\/\">https:\/\/rpubs.com\/EmilOWK\/<\/a><\/li>\n<\/ul>\n<p>Many of these contain public data, so you can simply download the same data and rerun their code (expect bugs!).<\/p>\n<p>Eventually, when you have done some of the simpler stuff, you will need to read this book. It may take you a month because it isn&#8217;t <em>that<\/em> easy, but it is great and well-worth the time.<\/p>\n<ul>\n<li>\n<div class=\"gs_citr\" tabindex=\"0\">James, G., Witten, D., Hastie, T., &amp; Tibshirani, R. (2013). <a href=\"http:\/\/faculty.marshall.usc.edu\/gareth-james\/ISL\/\"><i>An introduction to statistical learning<\/i><\/a>. New York: Springer.<\/div>\n<\/li>\n<\/ul>\n<p>They provide boomer-tier R code you can copy and run (mostly for <strong>glmnet<\/strong>). You however don&#8217;t really want to stick with their way, you want to <a href=\"https:\/\/emilkirkegaard.dk\/en\/?p=8679\">migrate to <strong>tidymodels<\/strong> framework<\/a> for applied machine learning. For spatial statistics, you want to learn tidy spatial statistics <a href=\"https:\/\/cran.r-project.org\/web\/packages\/sf\/index.html\">with <strong>sf<\/strong> package<\/a>. For regression modeling, you will want to read through this odd but informative book:<\/p>\n<ul>\n<li>\n<div class=\"gs_citr\" tabindex=\"0\">Harrell Jr, F. E. (2015). <a href=\"https:\/\/link.springer.com\/book\/10.1007\/978-3-319-19425-7\"><i>Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis<\/i><\/a>. Springer.<\/div>\n<\/li>\n<\/ul>\n<p>Which has its own package too, <strong>rms<\/strong>. The <strong>psych<\/strong> package provides a lot of nice functions for psychology related stuff. For latent variable modeling, you want <strong>lavaan<\/strong> package. For item response theory, you can begin with <strong>psych<\/strong> and migrate to <strong>mirt<\/strong> afterwards. <em>Learning how to do stuff in R is mainly a question of finding the right package.<\/em> Ask someone who has worked on the kind of problem you have what package is good for that.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>See 2019 post on introductions to psychology etc. for broader coverage Anon asks me today: What would be the best way to quickly self-teach enough statistics to be able to evaluate academic papers? I have the basic concepts down but I&#8217;ve never taken an actual stats\u00a0class. The glib reply is: l2code. For reals though. But [&hellip;]<\/p>\n","protected":false},"author":17,"featured_media":8853,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1766],"tags":[1719],"class_list":["post-8852","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-math-science","tag-introduction","entry","has-media"],"_links":{"self":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/8852","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/users\/17"}],"replies":[{"embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/comments?post=8852"}],"version-history":[{"count":4,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/8852\/revisions"}],"predecessor-version":[{"id":8858,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/8852\/revisions\/8858"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/media\/8853"}],"wp:attachment":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/media?parent=8852"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/categories?post=8852"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/tags?post=8852"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}