National chess skill: European culture, intelligence

How far should the net be cast as regards intellectual achievements? I suggest as far and wide as possible, or it will be assumed that some results are being held back. I favour those achievements which are in a “universal language” like maths, science and chess. There will always be some doubt about whether people in poor countries have access to knowledge and training, though the spread of internet access goes a long way to dealing with this. (In fact, it should level the playing field in terms of access to knowledge). Poker, Bridge, Backgammon, and Mahjong could be added to the list, because there are international competitions and rankings. I am not suggesting anyone should take part in such activities. Live and let live.

Chess a universal language like math? Methinks not. I took a quantitative look at FIDE players. Using data science tricks, I obtained a list of top 5k players and their countries. Then I did some plotting and modeling. Details can be found at rpubs.com/EmilOWK/chess_top5k_fide

Per capita rate ~ IQ

Obviously there are issues with country size, so we can weigh cases by sqrt(population) as we usually do.

Did not help. It is obvious that East Asian countries are outliers, and that we have issues with a large number of countries with no top chess players at all. If we use the log count approach that Noah used in the terrorism papers, which seems superior to the per capita approach (though the reason for this is unclear to me, somehow handles sampling error better), then a simple model finds a strongish effect of IQ:

Linear Regression Model
 rms::ols(formula = log_count ~ log_pop + IQ, data = natdata)
                Model Likelihood     Discrimination    
                   Ratio Test           Indexes        
 Obs     197    LR chi2    140.48    R2       0.510    
 sigma0.5491    d.f.            2    R2 adj   0.505    
 d.f.    194    Pr(> chi2) 0.0000    g        0.636    
 
 Residuals
      Min       1Q   Median       3Q      Max 
 -1.68341 -0.31570 -0.01964  0.39061  1.28864 
 
           Coef    S.E.   t      Pr(>|t|)
 Intercept -4.9833 0.4018 -12.40 <0.0001 
 log_pop    0.3281 0.0416   7.88 <0.0001 
 IQ         0.0405 0.0036  11.21 <0.0001 

Note that these are unstandardized coefficients, and thus not easy to compare. Neither is the effect size easy to understand since the outcome is a log10 count. The model output says that for 1 IQ increase, the expected number of log10(count + 1) FIDE champions increase by 0.04. So, if I’m not mistaken, this translates into 10^.04. The scaling is not linear. The model predicted number of FIDE players for countries with IQ 70, 80, …, 110 are 0.15, 1.93, 6.45, 17.93, 47.11. Or a factor of ~300 going from Africa level IQ to good cities.

However, most of this is due to the confound with European culture. If we add continent dummies, we get:

Linear Regression Model
 rms::ols(formula = log_count ~ log_pop + IQ + UN_continent, data = natdata)

                Model Likelihood     Discrimination    
                   Ratio Test           Indexes        
 Obs     197    LR chi2    219.72    R2       0.672    
 sigma0.4537    d.f.            6    R2 adj   0.662    
 d.f.    190    Pr(> chi2) 0.0000    g        0.730    
 
 Residuals
      Min       1Q   Median       3Q      Max 
 -1.22161 -0.24685 -0.01719  0.25710  1.34260 

                       Coef    S.E.   t     Pr(>|t|)
 Intercept             -2.7309 0.4861 -5.62 <0.0001 
 log_pop                0.4035 0.0392 10.28 <0.0001 
 IQ                     0.0164 0.0048  3.44 0.0007  
 UN_continent=Africa   -1.1256 0.1464 -7.69 <0.0001 
 UN_continent=Americas -0.6569 0.1191 -5.52 <0.0001 
 UN_continent=Asia     -0.9505 0.1044 -9.10 <0.0001 
 UN_continent=Oceania  -0.7782 0.1512 -5.15 <0.0001

The IQ coefficient declines substantially. If we imagine possible European countries with IQs from 70 to 110, they are expected to have 13, 19, 28, 41, 60 top 5k persons, or factor ~4.6, down from ~300 before. 4.6 is still quite a few, of course. Empirically, this model predicts that reality should work like this:

Whereas, if we plot the IQs and counts with slopes by continent, they look like this:

Notice the lack of a noticeable slope for Asia, mainly due to the Singapore, Japan, Koreas (but not China). So, we are probably grouping some countries that shouldn’t. We can also use UN’s provided smaller regions, though these are arguably too small. We get (setting Western Europe as the comparison):

Linear Regression Model
 rms::ols(formula = log_count ~ log_pop + IQ + UN_region, data = natdata)
                Model Likelihood     Discrimination    
                   Ratio Test           Indexes        
 Obs     197    LR chi2    267.20    R2       0.742    
 sigma0.4227    d.f.           24    R2 adj   0.706    
 d.f.    172    Pr(> chi2) 0.0000    g        0.767    
 
 Residuals
      Min       1Q   Median       3Q      Max 
 -1.08552 -0.22309 -0.01046  0.22147  1.15614 
 
                                     Coef    S.E.   t     Pr(>|t|)
 Intercept                           -4.3280 0.7710 -5.61 <0.0001 
 log_pop                              0.4198 0.0408 10.30 <0.0001 
 IQ                                   0.0316 0.0072  4.37 <0.0001 
 UN_region=Australia and New Zealand -0.6845 0.3344 -2.05 0.0422  
 UN_region=Caribbean                 -0.2719 0.2528 -1.08 0.2836  
 UN_region=Central America           -0.6612 0.2456 -2.69 0.0078  
 UN_region=Central Asia              -0.1106 0.2775 -0.40 0.6907  
 UN_region=Eastern Africa            -0.8351 0.2534 -3.30 0.0012  
 UN_region=Eastern Asia              -1.5121 0.2154 -7.02 <0.0001 
 UN_region=Eastern Europe             0.1641 0.2029  0.81 0.4198  
 UN_region=Europe                    -1.1183 0.4521 -2.47 0.0143  
 UN_region=Melanesia                 -0.8061 0.2655 -3.04 0.0028  
 UN_region=Micronesia                -0.3599 0.2910 -1.24 0.2179  
 UN_region=Middle Africa             -0.6288 0.3038 -2.07 0.0400  
 UN_region=Northern Africa           -0.7298 0.2578 -2.83 0.0052  
 UN_region=Northern America          -0.4391 0.2610 -1.68 0.0943  
 UN_region=Northern Europe            0.0283 0.2008  0.14 0.8882  
 UN_region=Polynesia                 -0.4774 0.3069 -1.56 0.1216  
 UN_region=South America             -0.4197 0.2131 -1.97 0.0505  
 UN_region=South-Eastern Asia        -0.9609 0.2065 -4.65 <0.0001 
 UN_region=Southern Africa           -0.5514 0.3087 -1.79 0.0758  
 UN_region=Southern Asia             -0.8013 0.2472 -3.24 0.0014  
 UN_region=Southern Europe            0.0403 0.1946  0.21 0.8362  
 UN_region=Western Africa            -0.7804 0.2805 -2.78 0.0060  
 UN_region=Western Asia              -0.5917 0.2050 -2.89 0.0044

Now IQ’s beta went back up again (0.0316).

What if we use the per capita approach?

Linear Regression Model
 rms::ols(formula = fide_per_million ~ IQ, data = natdata, weights = sqrt(population2017))
 
                  Model Likelihood     Discrimination    
                     Ratio Test           Indexes        
 Obs       197    LR chi2     14.86    R2       0.073    
 sigma264.0903    d.f.            1    R2 adj   0.068    
 d.f.      195    Pr(> chi2) 0.0001    g        1.280    
 
 Residuals
     Min      1Q  Median      3Q     Max 
 -3.4973 -1.2864 -0.4563  0.4652 92.7449 
 
           Coef    S.E.   t     Pr(>|t|)
 Intercept -7.3254 2.2682 -3.23 0.0015  
 IQ         0.1024 0.0262  3.91 0.0001 

I changed the outcome to top players per million, otherwise all the coefficients were tiny. We see a coefficient of 0.10 here, meaning that 1 IQ point increases the per million player by 0.10. If we use the usual model predictions (for 70, …, 110), this gives us values of -0.16, 0.87, 1.89, 2.91, 3.94. Negative values are of course impossible, but this model isn’t constrained to disallow such values (could be done with e.g. Bayesian priors). The violation isn’t too great anyway. If we add the small regions:

Linear Regression Model
 rms::ols(formula = fide_per_million ~ IQ + UN_region, data = natdata, 
     weights = sqrt(population2017))
 
                  Model Likelihood     Discrimination    
                     Ratio Test           Indexes        
 Obs       197    LR chi2     77.29    R2       0.325    
 sigma239.2931    d.f.           23    R2 adj   0.235    
 d.f.      173    Pr(> chi2) 0.0000    g        2.709    
 
 Residuals
      Min       1Q   Median       3Q      Max 
 -7.70820 -0.62140 -0.03586  0.31916 87.33090 
 
                                     Coef    S.E.   t     Pr(>|t|)
 Intercept                           -5.5729 8.0336 -0.69 0.4888  
 IQ                                   0.1117 0.0800  1.40 0.1647  
 UN_region=Australia and New Zealand -4.4978 3.1416 -1.43 0.1540  
 UN_region=Caribbean                 -1.4467 2.8945 -0.50 0.6178  
 UN_region=Central America           -3.5594 2.3098 -1.54 0.1251  
 UN_region=Central Asia              -2.3240 2.6887 -0.86 0.3886  
 UN_region=Eastern Africa            -2.4938 2.6692 -0.93 0.3515  
 UN_region=Eastern Asia              -6.0083 1.6921 -3.55 0.0005  
 UN_region=Eastern Europe             0.4284 1.7746  0.24 0.8095  
 UN_region=Europe                    -4.6798 7.4075 -0.63 0.5284  
 UN_region=Melanesia                 -3.7884 3.6618 -1.03 0.3023  
 UN_region=Micronesia                -3.7728 7.3435 -0.51 0.6081  
 UN_region=Middle Africa             -1.9938 3.1464 -0.63 0.5271  
 UN_region=Northern Africa           -3.4613 2.2871 -1.51 0.1320  
 UN_region=Northern America          -4.8395 2.0390 -2.37 0.0187  
 UN_region=Northern Europe            2.7450 2.0215  1.36 0.1762  
 UN_region=Polynesia                 -4.1905 8.1261 -0.52 0.6067  
 UN_region=South America             -3.4319 1.9566 -1.75 0.0812  
 UN_region=South-Eastern Asia        -4.2342 1.8029 -2.35 0.0200  
 UN_region=Southern Africa           -2.3886 3.2924 -0.73 0.4691  
 UN_region=Southern Asia             -3.4515 2.0883 -1.65 0.1002  
 UN_region=Southern Europe            2.6375 1.9077  1.38 0.1686  
 UN_region=Western Africa            -2.2252 2.8600 -0.78 0.4376  
 UN_region=Western Asia              -1.8373 1.9890 -0.92 0.3569  

The beta of IQ remained about the same, but it now has p = .16. There is too much noise to reliably see the signal. This can also be seen in the model fit’s across approaches: R2a: 0.706 vs. 0.235. So far, my intuitive thinking is that small populations and rare persons cause massive variation in the observed per capita rate. E.g. in this dataset, the observed rate per million of top FIDE is ~100 in Faroe Islands and Iceland but only 9-11 in the rest of Scandinavia. Are we supposed to believe this reflects some real difference? Hardly. Secondly, this massive sampling error is not (apparently) completely counteracted by down-weighing the importance of small samples in the model, at least not using the sqrt approach. Perhaps one can develop a more suitable weight to use. However, using counts, it doesn’t matter much if the count for a small country turns out to be 0 or 5 since a small number is predicted by the small population size in any event. For instance, Faroe Islands only has n = 5 (for population 50k), but it could have easily been 0 or 10 and neither value would have caused a major outlier using the counts approach, but would have done so using the per capita approach.

Why not use the non-log version? Theoretically, the use of logs should cause nonlinear interactions between IQ and population size to occur, but with n=200, we don’t have a realistic chance to estimate these. I did try a model with the interaction, but we don’t really have enough precision to estimate them either (bizarrely, it resulted in p = .006/.007 negative betas for IQ and population size, and the interaction with positive with p < .0001). Perhaps if one collected chess champions for some smaller unit, e.g. EU NUTS or USA counties.

Views All Time
Views All Time
895
Views Today
Views Today
14