{"id":4871,"date":"2015-03-11T04:28:53","date_gmt":"2015-03-11T03:28:53","guid":{"rendered":"http:\/\/emilkirkegaard.dk\/en\/?p=4871"},"modified":"2015-03-12T03:15:45","modified_gmt":"2015-03-12T02:15:45","slug":"measuring-scientific-knowledge-can-we-use-questions-that-are-denied-by-the-religious","status":"publish","type":"post","link":"https:\/\/emilkirkegaard.dk\/en\/2015\/03\/measuring-scientific-knowledge-can-we-use-questions-that-are-denied-by-the-religious\/","title":{"rendered":"Measuring scientific knowledge: can we use questions that are denied by the religious?"},"content":{"rendered":"<p>In reply to http:\/\/www.ljzigerell.com\/?p=534 and his working paper here: http:\/\/www.ljzigerell.com\/?p=2376<\/p>\n<p>We are discussing his working paper over email, and I had some reservations about his factor analysis. I decided to run the analyses I wanted myself, but it turned into a longer project which should be placed in a short paper instead of in a private email.<\/p>\n<p>I fetched the data from his source. The raw data did not have variable names, so was unwieldy to work with. I opened the SPSS file, and it did have variable names. Then I exported the CSV with the desired variables (see supp. material). Then I had to recoded the variables so that the true answers are coded as 1, false answers as 0, and missing as NA. This took some time. I followed his coding procedure for most cases (see his STATE file and my R code below).<\/p>\n<p><strong>How many factors to extract<\/strong><\/p>\n<p>It seems that he relies on some kind of method for determining the number of factors to extract, presumably Eigenvalue&gt;1. I always use three different methods using the nFactors package. Using all 22 variables (note that he did not this all of them at once), all methods agreed to extract 5 factors (at max). Here&#8217;s the factor solutions for extracting 1 thru 5 factors and their intercorrelations:<\/p>\n<p><strong>Factor analyses with 1-5 factors and their correlations<\/strong><\/p>\n<pre id=\"rstudio_console_output\" class=\"GCG52B5BAFB\" tabindex=\"0\">[1] \"Factor analysis, extracting 1 factors using oblimin and MinRes\"\r\n\r\nLoadings:\r\n         MR1   \r\nsmokheal  0.129\r\ncondrift  0.347\r\nrmanmade  0.445\r\nearthhot  0.348\r\noxyplant  0.189\r\nlasers    0.514\r\natomsize  0.441\r\nantibiot  0.401\r\ndinosaur  0.323\r\nlight     0.384\r\nearthsun  0.515\r\nsuntime   0.581\r\ndadgene   0.227\r\ngetdrug   0.290\r\nwhytest   0.423\r\nprobno4   0.396\r\nproblast  0.423\r\nprobreq   0.349\r\nprobif3   0.416\r\nevolved   0.306\r\nbigbang   0.315\r\nonfaith  -0.296\r\n\r\n                 MR1\r\nSS loadings    3.191\r\nProportion Var 0.145\r\n[1] \"Factor analysis, extracting 2 factors using oblimin and MinRes\"\r\n\r\nLoadings:\r\n         MR1    MR2   \r\nsmokheal  0.121       \r\ncondrift  0.345       \r\nrmanmade  0.368  0.136\r\nearthhot  0.363       \r\noxyplant  0.172       \r\nlasers    0.518       \r\natomsize  0.461       \r\nantibiot  0.323  0.133\r\ndinosaur  0.323       \r\nlight     0.375       \r\nearthsun  0.587       \r\nsuntime   0.658       \r\ndadgene   0.145  0.130\r\ngetdrug   0.211  0.130\r\nwhytest   0.386       \r\nprobno4          0.705\r\nproblast         0.789\r\nprobreq   0.162  0.305\r\nprobif3   0.108  0.514\r\nevolved   0.348       \r\nbigbang   0.367       \r\nonfaith  -0.266       \r\n\r\n                 MR1   MR2\r\nSS loadings    2.617 1.569\r\nProportion Var 0.119 0.071\r\nCumulative Var 0.119 0.190\r\n     MR1  MR2\r\nMR1 1.00 0.35\r\nMR2 0.35 1.00\r\n[1] \"Factor analysis, extracting 3 factors using oblimin and MinRes\"\r\n\r\nLoadings:\r\n         MR2    MR1    MR3   \r\nsmokheal                     \r\ncondrift                0.346\r\nrmanmade  0.173  0.170  0.232\r\nearthhot         0.187  0.220\r\noxyplant                0.100\r\nlasers           0.256  0.320\r\natomsize         0.208  0.312\r\nantibiot  0.168  0.150  0.198\r\ndinosaur         0.119  0.250\r\nlight            0.240  0.169\r\nearthsun         0.737       \r\nsuntime          0.754       \r\ndadgene   0.147              \r\ngetdrug   0.152         0.149\r\nwhytest   0.108  0.143  0.294\r\nprobno4   0.708              \r\nproblast  0.781              \r\nprobreq   0.324              \r\nprobif3   0.532              \r\nevolved                 0.562\r\nbigbang                 0.525\r\nonfaith                -0.307\r\n\r\n                 MR2   MR1   MR3\r\nSS loadings    1.646 1.444 1.389\r\nProportion Var 0.075 0.066 0.063\r\nCumulative Var 0.075 0.140 0.204\r\n     MR2  MR1  MR3\r\nMR2 1.00 0.29 0.25\r\nMR1 0.29 1.00 0.43\r\nMR3 0.25 0.43 1.00\r\n[1] \"Factor analysis, extracting 4 factors using oblimin and MinRes\"\r\n\r\nLoadings:\r\n         MR4    MR2    MR1    MR3   \r\nsmokheal                            \r\ncondrift  0.180                0.234\r\nrmanmade  0.387                     \r\nearthhot  0.262         0.102       \r\noxyplant  0.116                     \r\nlasers    0.490                     \r\natomsize  0.435                     \r\nantibiot  0.485                     \r\ndinosaur  0.312                     \r\nlight     0.274         0.142       \r\nearthsun                0.797       \r\nsuntime                 0.719       \r\ndadgene   0.234                     \r\ngetdrug   0.273                     \r\nwhytest   0.438                     \r\nprobno4          0.695              \r\nproblast         0.817              \r\nprobreq   0.180  0.275              \r\nprobif3   0.139  0.487              \r\nevolved                        0.685\r\nbigbang                        0.554\r\nonfaith  -0.141               -0.230\r\n\r\n                 MR4   MR2   MR1   MR3\r\nSS loadings    1.511 1.501 1.204 0.915\r\nProportion Var 0.069 0.068 0.055 0.042\r\nCumulative Var 0.069 0.137 0.192 0.233\r\n     MR4  MR2  MR1  MR3\r\nMR4 1.00 0.39 0.57 0.42\r\nMR2 0.39 1.00 0.23 0.12\r\nMR1 0.57 0.23 1.00 0.27\r\nMR3 0.42 0.12 0.27 1.00\r\n[1] \"Factor analysis, extracting 5 factors using oblimin and MinRes\"\r\n\r\nLoadings:\r\n         MR2    MR1    MR3    MR5    MR4   \r\nsmokheal                                   \r\ncondrift                0.209         0.299\r\nrmanmade  0.104                0.120  0.379\r\nearthhot                              0.367\r\noxyplant                              0.220\r\nlasers                         0.195  0.361\r\natomsize                       0.273  0.207\r\nantibiot                       0.401  0.108\r\ndinosaur                       0.204  0.131\r\nlight                                 0.423\r\nearthsun         0.504                0.186\r\nsuntime          1.007                     \r\ndadgene                        0.277       \r\ngetdrug                        0.373       \r\nwhytest                        0.504       \r\nprobno4   0.701                            \r\nproblast  0.816                            \r\nprobreq   0.272                0.174       \r\nprobif3   0.487                0.107       \r\nevolved                 0.753              \r\nbigbang                 0.483         0.165\r\nonfaith                -0.225 -0.152       \r\n\r\n                 MR2   MR1   MR3   MR5   MR4\r\nSS loadings    1.501 1.291 0.919 0.874 0.871\r\nProportion Var 0.068 0.059 0.042 0.040 0.040\r\nCumulative Var 0.068 0.127 0.169 0.208 0.248\r\n     MR2  MR1  MR3  MR5  MR4\r\nMR2 1.00 0.20 0.11 0.38 0.28\r\nMR1 0.20 1.00 0.21 0.41 0.44\r\nMR3 0.11 0.21 1.00 0.32 0.30\r\nMR5 0.38 0.41 0.32 1.00 0.50\r\nMR4 0.28 0.44 0.30 0.50 1.00<\/pre>\n<p><strong>Interpretation<\/strong><\/p>\n<p>We see that in the 1-factor solution, all variables load in the expected direction, and we can speak of a general scientific knowledge factor. This is the one we want to use for other analyses. We see that faith loads negatively. This variable is not a true\/false question, and thus should be excluded from any actual measurement of the general scientific knowledge factor.<\/p>\n<p>Increasing the number of factors to extract simply divides this general factor into correlated parts. E.g. in the 2-factor solution, we see a probability factor that correlates .35 with the remaining semi-general factor. In solution 3, we see MR2 as the probability factor, MR3 as the knowledge related to religious beliefs factor and MR1 as the remaining items. Intercorrelations are .29, .25 and .43. This pattern continues until the 5th solution which still produces 5 correlated factors: MR2 is the probability factor, MR1 is an astronomy factor, MR3 is the one having to do with religious beliefs, MR5 looks like a medicine\/genetics factor, and MR4 is the rest.<\/p>\n<p>Just because scree tests etc. tell you to extract &gt;1 factor does not mean that there is no general factor. This is the old fallacy made in the study of cognitive ability. See discussion in Jensen 1998 (chapter 3). It is sometimes still made e.g. Hampshire, et al (2012). Generally, as one increases the number of variables, the suggested number of factors to extract goes up. This does not mean that there is no general factor, just that with increasing number of variables, one can see a more fine-grained structure in the data than one can with only e.g. 5 variables.<\/p>\n<p><strong>Should we use them or not?<\/strong><\/p>\n<p>Before discussing whether one should theoretically use them or not, one can measure if it makes much of a difference. One can do this by extracting the general factor with and without the items in questions. I did this, also excluding the onfaith item. Then I correlated the scores from these two analysis: r=.992. In other words, it hardly matters whether one includes these religious-tinged items or not. The general factor is measured quite well already without them and they do not substantially change the factor scores. However, since adding more indicator items\/variables generally reduces measurement error of a latent trait\/factor, I would include them in my analyses.<\/p>\n<p><strong>How many factors should we extract and use?<\/strong><\/p>\n<p>There is also the question of how many factors one should extract. The answer is that it depends on what one wants to do. As Zigerell points out in a review comment of this paper on Winnower:<\/p>\n<blockquote><p>For example, for diagnostic purposes, if we know only that students A, B, and C miss 3 items on a test of general science knowledge, then the only remediation is more science; but we can provide more tailored remediation if we have separate components so that we observe that, say, A did poorly only on the religion-tinged items, B did poorly only on the probability items, and C did poorly only on the astronomy items.<\/p><\/blockquote>\n<p>For remedial education, it is clearly preferable to extract the highest number of interpretable factors because this gives the most precise information where knowledge is lacking for a given person. In regression analysis where we want to control for scientific knowledge, one should use the general factor.<\/p>\n<p><strong>References<\/strong><\/p>\n<p id=\"gs_cit1\" class=\"gs_citr\" tabindex=\"0\">Hampshire, A., Highfield, R. R., Parkin, B. L., &amp; Owen, A. M. (2012). Fractionating human intelligence. <i>Neuron<\/i>, <i>76<\/i>(6), 1225-1237.<\/p>\n<p id=\"gs_cit1\" class=\"gs_citr\" tabindex=\"0\">Jensen, A. R. (1998). <i>The g factor: The science of mental ability<\/i>. Westport, CT: Praeger.<\/p>\n<p><strong>Supplementary material<\/strong><\/p>\n<p>Datafile: <a href=\"http:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/science_data.csv\">science_data<\/a><\/p>\n<p><strong>R code<\/strong><\/p>\n<pre>library(plyr) #for mapvalues\r\n\r\ndata = read.csv(\"science_data.csv\") #load data\r\n\r\n#Coding so that 1 = true, 0 = false\r\ndata$smokheal = mapvalues(data$smokheal, c(9,7,8,2),c(NA,0,0,0))\r\ndata$condrift = mapvalues(data$condrift, c(9,7,8,2),c(NA,0,0,0))\r\ndata$earthhot = mapvalues(data$earthhot, c(9,7,8,2),c(NA,0,0,0))\r\ndata$rmanmade = mapvalues(data$rmanmade, c(9,7,8,1,2),c(NA,0,0,0,1)) #reverse\r\ndata$oxyplant = mapvalues(data$oxyplant, c(9,7,8,2),c(NA,0,0,0))\r\ndata$lasers = mapvalues(data$lasers, c(9,7,8,2,1),c(NA,0,0,1,0)) #reverse\r\ndata$atomsize = mapvalues(data$atomsize, c(9,7,8,2),c(NA,0,0,0))\r\ndata$antibiot = mapvalues(data$antibiot, c(9,7,8,2,1),c(NA,0,0,1,0)) #reverse\r\ndata$dinosaur = mapvalues(data$dinosaur, c(9,7,8,2,1),c(NA,0,0,1,0)) #reverse\r\ndata$light = mapvalues(data$light, c(9,7,8,2,3),c(NA,0,0,0,0))\r\ndata$earthsun = mapvalues(data$earthsun, c(9,7,8,2),c(NA,0,0,0))\r\ndata$suntime = mapvalues(data$suntime, c(9,7,8,2,3,1,4,99),c(0,0,0,0,1,0,0,NA))\r\ndata$dadgene = mapvalues(data$dadgene, c(9,7,8,2),c(NA,0,0,0))\r\ndata$getdrug = mapvalues(data$getdrug, c(9,7,8,2,1),c(NA,0,0,1,0)) #reverse\r\ndata$whytest = mapvalues(data$whytest, c(1,2,3,4,5,6,7,8,9,99),c(1,0,0,0,0,0,0,0,0,NA))\r\ndata$probno4 = mapvalues(data$probno4, c(9,8,2,1),c(NA,0,1,0)) #reverse\r\ndata$problast = mapvalues(data$problast, c(9,8,2,1),c(NA,0,1,0)) #reverse\r\ndata$probreq = mapvalues(data$probreq, c(9,8,2),c(NA,0,0))\r\ndata$probif3 = mapvalues(data$probif3, c(9,8,2,1),c(NA,0,1,0)) #reverse\r\ndata$evolved = mapvalues(data$evolved, c(9,7,8,2),c(NA,0,0,0))\r\ndata$bigbang = mapvalues(data$bigbang, c(9,7,8,2),c(NA,0,0,0))\r\ndata$onfaith = mapvalues(data$onfaith, c(9,1,2,3,4,7,8),c(NA,1,1,0,0,0,0))\r\n\r\n#How many factors to extract?\r\nlibrary(nFactors)\r\nnScree(data[complete.cases(data),]) #use complete cases only\r\n\r\n#extract factors\r\nlibrary(psych) #for factor analysis\r\nfor (num in 1:5) {\r\n\u00a0 print(paste0(\"Factor analysis, extracting \",num,\" factors using oblimin and MinRes\"))\r\n\u00a0 fa = fa(data,num) #extract factors\r\n\u00a0 print(fa$loadings) #print\r\n\u00a0 if (num&gt;1){ #print factor cors\r\n\u00a0\u00a0\u00a0 phi = round(fa$Phi,2) #round to 2 decimals\r\n\u00a0\u00a0\u00a0 colnames(phi) = rownames(phi) = colnames(fa$scores) #set names\r\n\u00a0\u00a0\u00a0 print(phi) #print\r\n\u00a0 }\r\n}\r\n\r\n#Does it make a difference?\r\nfa.all = fa(data[1:21]) #no onfaith\r\nfa.noreligious = fa(data[1:19]) #no onfaith, bigbang, evolved\r\ncor(fa.all$scores,fa.noreligious$scores, use=\"pair\") #correlation, ignore missing cases<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>In reply to http:\/\/www.ljzigerell.com\/?p=534 and his working paper here: http:\/\/www.ljzigerell.com\/?p=2376 We are discussing his working paper over email, and I had some reservations about his factor analysis. I decided to run the analyses I wanted myself, but it turned into a longer project which should be placed in a short paper instead of in a [&hellip;]<\/p>\n","protected":false},"author":17,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1839,1107,1],"tags":[2106,2075,2273,2105],"class_list":["post-4871","post","type-post","status-publish","format-standard","hentry","category-psychometics","category-science","category-uncategorized","tag-factor-analysis","tag-general-factor","tag-religion-filosofi","tag-scientific-knowledge","entry"],"_links":{"self":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/4871","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/users\/17"}],"replies":[{"embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/comments?post=4871"}],"version-history":[{"count":5,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/4871\/revisions"}],"predecessor-version":[{"id":4880,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/4871\/revisions\/4880"}],"wp:attachment":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/media?parent=4871"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/categories?post=4871"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/tags?post=4871"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}