• http://www.unz.com/jthompson/chisalas-last-word/

How far should the net be cast as regards intellectual achievements? I suggest as far and wide as possible, or it will be assumed that some results are being held back. I favour those achievements which are in a “universal language” like maths, science and chess. There will always be some doubt about whether people in poor countries have access to knowledge and training, though the spread of internet access goes a long way to dealing with this. (In fact, it should level the playing field in terms of access to knowledge). Poker, Bridge, Backgammon, and Mahjong could be added to the list, because there are international competitions and rankings. I am not suggesting anyone should take part in such activities. Live and let live.

Chess a universal language like math? Methinks not. I took a quantitative look at FIDE players. Using data science tricks, I obtained a list of top 5k players and their countries. Then I did some plotting and modeling. Details can be found at http://rpubs.com/EmilOWK/chess_top5k_fide

Per capita rate ~ IQ

Obviously there are issues with country size, so we can weigh cases by sqrt(population) as we usually do.

Did not help. It is obvious that East Asian countries are outliers, and that we have issues with a large number of countries with no top chess players at all. If we use the log count approach that Noah used in the terrorism papers, which seems superior to the per capita approach (though the reason for this is unclear to me, somehow handles sampling error better), then a simple model finds a strongish effect of IQ:

```Linear Regression Model
rms::ols(formula = log_count ~ log_pop + IQ, data = natdata)
Model Likelihood     Discrimination
Ratio Test           Indexes
Obs     197    LR chi2    140.48    R2       0.510
sigma0.5491    d.f.            2    R2 adj   0.505
d.f.    194    Pr(> chi2) 0.0000    g        0.636

Residuals
Min       1Q   Median       3Q      Max
-1.68341 -0.31570 -0.01964  0.39061  1.28864

Coef    S.E.   t      Pr(>|t|)
Intercept -4.9833 0.4018 -12.40 <0.0001
log_pop    0.3281 0.0416   7.88 <0.0001
IQ         0.0405 0.0036  11.21 <0.0001
```

Note that these are unstandardized coefficients, and thus not easy to compare. Neither is the effect size easy to understand since the outcome is a log10 count. The model output says that for 1 IQ increase, the expected number of log10(count + 1) FIDE champions increase by 0.04. So, if I’m not mistaken, this translates into 10^.04. The scaling is not linear. The model predicted number of FIDE players for countries with IQ 70, 80, …, 110 are 0.15, 1.93, 6.45, 17.93, 47.11. Or a factor of ~300 going from Africa level IQ to good cities.

However, most of this is due to the confound with European culture. If we add continent dummies, we get:

```Linear Regression Model
rms::ols(formula = log_count ~ log_pop + IQ + UN_continent, data = natdata)

Model Likelihood     Discrimination
Ratio Test           Indexes
Obs     197    LR chi2    219.72    R2       0.672
sigma0.4537    d.f.            6    R2 adj   0.662
d.f.    190    Pr(> chi2) 0.0000    g        0.730

Residuals
Min       1Q   Median       3Q      Max
-1.22161 -0.24685 -0.01719  0.25710  1.34260

Coef    S.E.   t     Pr(>|t|)
Intercept             -2.7309 0.4861 -5.62 <0.0001
log_pop                0.4035 0.0392 10.28 <0.0001
IQ                     0.0164 0.0048  3.44 0.0007
UN_continent=Africa   -1.1256 0.1464 -7.69 <0.0001
UN_continent=Americas -0.6569 0.1191 -5.52 <0.0001
UN_continent=Asia     -0.9505 0.1044 -9.10 <0.0001
UN_continent=Oceania  -0.7782 0.1512 -5.15 <0.0001
```

The IQ coefficient declines substantially. If we imagine possible European countries with IQs from 70 to 110, they are expected to have 13, 19, 28, 41, 60 top 5k persons, or factor ~4.6, down from ~300 before. 4.6 is still quite a few, of course. Empirically, this model predicts that reality should work like this:

Whereas, if we plot the IQs and counts with slopes by continent, they look like this:

Notice the lack of a noticeable slope for Asia, mainly due to the Singapore, Japan, Koreas (but not China). So, we are probably grouping some countries that shouldn’t. We can also use UN’s provided smaller regions, though these are arguably too small. We get (setting Western Europe as the comparison):

```Linear Regression Model
rms::ols(formula = log_count ~ log_pop + IQ + UN_region, data = natdata)
Model Likelihood     Discrimination
Ratio Test           Indexes
Obs     197    LR chi2    267.20    R2       0.742
sigma0.4227    d.f.           24    R2 adj   0.706
d.f.    172    Pr(> chi2) 0.0000    g        0.767

Residuals
Min       1Q   Median       3Q      Max
-1.08552 -0.22309 -0.01046  0.22147  1.15614

Coef    S.E.   t     Pr(>|t|)
Intercept                           -4.3280 0.7710 -5.61 <0.0001
log_pop                              0.4198 0.0408 10.30 <0.0001
IQ                                   0.0316 0.0072  4.37 <0.0001
UN_region=Australia and New Zealand -0.6845 0.3344 -2.05 0.0422
UN_region=Caribbean                 -0.2719 0.2528 -1.08 0.2836
UN_region=Central America           -0.6612 0.2456 -2.69 0.0078
UN_region=Central Asia              -0.1106 0.2775 -0.40 0.6907
UN_region=Eastern Africa            -0.8351 0.2534 -3.30 0.0012
UN_region=Eastern Asia              -1.5121 0.2154 -7.02 <0.0001
UN_region=Eastern Europe             0.1641 0.2029  0.81 0.4198
UN_region=Europe                    -1.1183 0.4521 -2.47 0.0143
UN_region=Melanesia                 -0.8061 0.2655 -3.04 0.0028
UN_region=Micronesia                -0.3599 0.2910 -1.24 0.2179
UN_region=Middle Africa             -0.6288 0.3038 -2.07 0.0400
UN_region=Northern Africa           -0.7298 0.2578 -2.83 0.0052
UN_region=Northern America          -0.4391 0.2610 -1.68 0.0943
UN_region=Northern Europe            0.0283 0.2008  0.14 0.8882
UN_region=Polynesia                 -0.4774 0.3069 -1.56 0.1216
UN_region=South America             -0.4197 0.2131 -1.97 0.0505
UN_region=South-Eastern Asia        -0.9609 0.2065 -4.65 <0.0001
UN_region=Southern Africa           -0.5514 0.3087 -1.79 0.0758
UN_region=Southern Asia             -0.8013 0.2472 -3.24 0.0014
UN_region=Southern Europe            0.0403 0.1946  0.21 0.8362
UN_region=Western Africa            -0.7804 0.2805 -2.78 0.0060
UN_region=Western Asia              -0.5917 0.2050 -2.89 0.0044
```

Now IQ’s beta went back up again (0.0316).

What if we use the per capita approach?

```Linear Regression Model
rms::ols(formula = fide_per_million ~ IQ, data = natdata, weights = sqrt(population2017))

Model Likelihood     Discrimination
Ratio Test           Indexes
Obs       197    LR chi2     14.86    R2       0.073
sigma264.0903    d.f.            1    R2 adj   0.068
d.f.      195    Pr(> chi2) 0.0001    g        1.280

Residuals
Min      1Q  Median      3Q     Max
-3.4973 -1.2864 -0.4563  0.4652 92.7449

Coef    S.E.   t     Pr(>|t|)
Intercept -7.3254 2.2682 -3.23 0.0015
IQ         0.1024 0.0262  3.91 0.0001
```

I changed the outcome to top players per million, otherwise all the coefficients were tiny. We see a coefficient of 0.10 here, meaning that 1 IQ point increases the per million player by 0.10. If we use the usual model predictions (for 70, …, 110), this gives us values of -0.16, 0.87, 1.89, 2.91, 3.94. Negative values are of course impossible, but this model isn’t constrained to disallow such values (could be done with e.g. Bayesian priors). The violation isn’t too great anyway. If we add the small regions:

```Linear Regression Model
rms::ols(formula = fide_per_million ~ IQ + UN_region, data = natdata,
weights = sqrt(population2017))

Model Likelihood     Discrimination
Ratio Test           Indexes
Obs       197    LR chi2     77.29    R2       0.325
sigma239.2931    d.f.           23    R2 adj   0.235
d.f.      173    Pr(> chi2) 0.0000    g        2.709

Residuals
Min       1Q   Median       3Q      Max
-7.70820 -0.62140 -0.03586  0.31916 87.33090

Coef    S.E.   t     Pr(>|t|)
Intercept                           -5.5729 8.0336 -0.69 0.4888
IQ                                   0.1117 0.0800  1.40 0.1647
UN_region=Australia and New Zealand -4.4978 3.1416 -1.43 0.1540
UN_region=Caribbean                 -1.4467 2.8945 -0.50 0.6178
UN_region=Central America           -3.5594 2.3098 -1.54 0.1251
UN_region=Central Asia              -2.3240 2.6887 -0.86 0.3886
UN_region=Eastern Africa            -2.4938 2.6692 -0.93 0.3515
UN_region=Eastern Asia              -6.0083 1.6921 -3.55 0.0005
UN_region=Eastern Europe             0.4284 1.7746  0.24 0.8095
UN_region=Europe                    -4.6798 7.4075 -0.63 0.5284
UN_region=Melanesia                 -3.7884 3.6618 -1.03 0.3023
UN_region=Micronesia                -3.7728 7.3435 -0.51 0.6081
UN_region=Middle Africa             -1.9938 3.1464 -0.63 0.5271
UN_region=Northern Africa           -3.4613 2.2871 -1.51 0.1320
UN_region=Northern America          -4.8395 2.0390 -2.37 0.0187
UN_region=Northern Europe            2.7450 2.0215  1.36 0.1762
UN_region=Polynesia                 -4.1905 8.1261 -0.52 0.6067
UN_region=South America             -3.4319 1.9566 -1.75 0.0812
UN_region=South-Eastern Asia        -4.2342 1.8029 -2.35 0.0200
UN_region=Southern Africa           -2.3886 3.2924 -0.73 0.4691
UN_region=Southern Asia             -3.4515 2.0883 -1.65 0.1002
UN_region=Southern Europe            2.6375 1.9077  1.38 0.1686
UN_region=Western Africa            -2.2252 2.8600 -0.78 0.4376
UN_region=Western Asia              -1.8373 1.9890 -0.92 0.3569
```

The beta of IQ remained about the same, but it now has p = .16. There is too much noise to reliably see the signal. This can also be seen in the model fit’s across approaches: R2a: 0.706 vs. 0.235. So far, my intuitive thinking is that small populations and rare persons cause massive variation in the observed per capita rate. E.g. in this dataset, the observed rate per million of top FIDE is ~100 in Faroe Islands and Iceland but only 9-11 in the rest of Scandinavia. Are we supposed to believe this reflects some real difference? Hardly. Secondly, this massive sampling error is not (apparently) completely counteracted by down-weighing the importance of small samples in the model, at least not using the sqrt approach. Perhaps one can develop a more suitable weight to use. However, using counts, it doesn’t matter much if the count for a small country turns out to be 0 or 5 since a small number is predicted by the small population size in any event. For instance, Faroe Islands only has n = 5 (for population 50k), but it could have easily been 0 or 10 and neither value would have caused a major outlier using the counts approach, but would have done so using the per capita approach.

Why not use the non-log version? Theoretically, the use of logs should cause nonlinear interactions between IQ and population size to occur, but with n=200, we don’t have a realistic chance to estimate these. I did try a model with the interaction, but we don’t really have enough precision to estimate them either (bizarrely, it resulted in p = .006/.007 negative betas for IQ and population size, and the interaction with positive with p < .0001). Perhaps if one collected chess champions for some smaller unit, e.g. EU NUTS or USA counties.