Clear Language, Clear Mind

September 17, 2015

Rationality and bias test results

I’ve always considered myself a very rational and fairly unbiased person. Being aware of the general tendency for people to overestimate themselves (see also visualization of the Dunning-Kruger effect), this of course reduces my confidence in my own estimates of these. So what better to do than take some actual tests? I have previously taken the short test of estimation ability found in What intelligence tests miss and got 5/5 right. This is actually slight evidence of underconfidence since I was supposed to give 80% confidence intervals. This of course means that I should have had 1 error, not 0. Still, with 5 items, the precision is too low to say whether I’m actually underconfident or not with much certainty, but it shows that I’m unlikely to be strongly overconfident. Underconfidence is expected for smarter people. A project of mine is to make a test with the confidence intervals that is much longer so to give more precise estimates. It should be fairly easy to find a lot of numbers for stuff and have people give 80% confidence intervals for the numbers. Stuff like the depth of the deepest ocean, height of tallest mountain, age of oldest living organism, age of the Earth/universe, dates for various historical events such as ending of WW2, beginning of American Civil war, and so on.

However, I recently saw an article about a political bias test. I think I’m fairly unbiased. As a result of this, my beliefs don’t really fit into any mainstream political theory. This is as expected because the major political ideologies were invented before we understood much about anything, thus making it unlikely that they would tend to get everything right. More likely, they would probably get some things right and some things wrong.

Here’s my test results for political bias:

Screenshot from 2015-09-16 20:38:35 Screenshot from 2015-09-16 20:41:05

So in centiles: >= 99th for knowledge of American politics. This is higher than I expected (around 95th). Since I’m not a US citizen, presumably the test has some bias against me. For bias, the centile is <= 20th. This result did not surprise me. However, since there is a huge floor effect, this test needs more items to be more useful.

Next up, I looked at the website and saw that they had a number of other potentially useful tests. One is about common misconceptions. Now since I consider myself a scientific rationalist, I should do fairly well on this. Also because I have read somewhat extensively on the issue (Wikipedia list, Snopes and 50 myths of pop psych).

Unfortunately, they present the results in a verbose form. Pasting 8 images would be excessive. I will paste some of the relevant text:

1. Brier score

Your total performance across all quiz and confidence questions:

This measures your overall ability on this test. This number above, known as a “Brier” score, is a combination of two data points:

How many answers you got correct
Whether you were confident at the right times. That means being more confident on questions you were more likely to be right about, and less confident on questions you were less likely to get right.

The higher this score is, the more answers you got correct AND the more you were appropriately confident at the right times and appropriately uncertain at the right times.

Your score is above average. Most people’s Brier’s scores fall in the range of 65-80%. About 5% of people got a score higher than yours. That’s a really good score!

2. Overall accuracy

Answers you got correct: 80%

Out of 30 questions, you got 24 correct. Great work B.S. detector! You performed above average. Looks like you’re pretty good at sorting fact from fiction. Most people correctly guess between 16 and 21 answers, a little better than chance.

Out of the common misconceptions we asked you about, you correctly said that 12/15 were actually B.S. That’s pretty good!

No centiles are provided, so it is not evident how this compares to others.

3. Total points

Screenshot from 2015-09-16 22:12:06

As for your points, this is another way of showing your ability to detect Fact vs. B.S. and your confidence accuracy. The larger the score, the better you are at doing both! Most people score between 120 and 200 points. Looks like you did very well, ending at 204 points.

4. Reliability of confidence intervals

Reliability of your confidence levels: 89.34%

Were you confident at the right times? To find out, we took a portion of your earlier Brier score to determine just how reliable your level of confidence was. It looks like your score is above average. About 10% of people got a score higher than yours.

This score measures the link between the size of your bet and the chance you got the correct answer. If you were appropriately confident at the right times, we’d expect you to bet a lot of points more often when you got the answer correct than when you didn’t. If you were appropriately uncertain at the right times, we’d expect you to typically bet only a few points when you got the answer wrong.

You can interpret this score as measuring the ability of your gut to distinguish between things that are very likely true, versus only somewhat likely true. Or in other words, this score tries to answer the question, “When you feel more confident in something, does that actually make it more likely to be true?”

5. Confidence and accuracy

Screenshot from 2015-09-16 22:16:37

When you bet 1-3 points your confidence was accurate. You were a little confident in your answers and got the answer correct 69.23% of the time. Nice work!

When you bet 4-7 points you were underconfident. You were fairly confident in your answers, but you should have been even more confident because you got the answer correct 100% of the time!

When you bet 8-10 points your confidence was accurate. You were extremely confident in your answer and indeed got the answer correct 100% of the time. Great work!

So, again there is some evidence of underconfidence. E.g. for those I betted 0 points, I still had 60% accuracy, tho it should have been 50%.

6. Overall confidence

Your confidence: very underconfident

You tended to be very underconfident in your answers overall. Let’s explore what that means.

In the chart above, your betting average has been translated into a new score called your “average confidence.” This represents roughly how confident you were in each of your answers.

People who typically bet close to 0 points would have an average confidence near 50% (i.e. they aren’t confident at all and don’t think they’ll do much better than chance).
People who typically bet about 5 points would have an average confidence near 75% (i.e. they’re fairly confident; they might’ve thought there was a ¼ chance of being wrong).
People who typically bet 10 points would have an average confidence near 100% (i.e. they are extremely confident; they thought there was almost no chance of being wrong).

The second bar is the average number of questions you got correct. You got 24 questions correct, or 80%.

If you are a highly accurate better, then the two bars above should be about equal. That is, if you got an 80% confidence score, then you should have gotten about 80% of the questions correct.

We said you were underconfident because on average you bet at a confidence level of 69.67% (i.e. you bet 3.93 on average), but in reality you did better than that, getting the answer right 80% of the time.

In general, results were in line with my predictions: high ability + general overestimation + imperfect correlation of self-rated ability and actual ability results in underconfidence. My earlier result indicated some underconfidence as well. The longer test gave the same result. Apparently, I need to be more confident in myself. This is despite the fact that I scored 98 and 99 on the assertiveness facet on the OCEAN test on two different test taking sessions with some months in between.

I did take their additional rationality test, but since this was just based on pop psych Kahneman-style points, it doesn’t seem very useful. It also uses typological thinking because it classifies people into 16 classes, clearly wrong-headed. It found my weakest side to be planning fallacy, but this isn’t actually the case because I’m pretty good at getting papers and projects done on time.

Update 2018-11-10

David Pinsen (who is a trader with skin in the game for prediction making) sent me an additional test to try: It follows the same approach as the confidence test with the US states, i.e. bunch of forced TRUE/FALSE questions and a question about certainty. My results were therefore unsurprisingly similar to before (close to perfect calibration) but somewhat overconfident this time (by 7%points). The test was too short for their number of options, they should have combined some of them. Annoyingly, they didn’t provide a Brier score, but I calculated it to be .20. It is surprisingly difficult to find guidelines for this metric, but in Superforecasters, a case story of given of a superforecaster (i.e. top predictor) called Doug. He had an initial score of .22 and was top #5 that year. However, this was calculated with the multi-class version of the formula (see FAQ), so his score was .11 on the scale I used. The average forecaster was .37 or 0.185 on the 0-1 scale. So, appears I did slightly worse than the average forecaster in Good Judgment Project. This result is probably pretty good considering that this is a self-selected group of people who spend a lot of time making forecasts based on any information they choose, while I didn’t use any help here just guessed based on whatever was in my head already. The test above provided a brier score as well, but gave it as a percentage. This might be a reversed version (i.e. (1-brier)*100), and if so, my new score would be 81.5% compared to 86.8 before%.

Note: the 60% point is based on n = 4 of which I got 1 right.

November 29, 2014

Criticism of BMI and simple IQ tests – a conceptual link?

Filed under: Critical thinking / meta-thinking,Medicine — Tags: , , — Emil O. W. Kirkegaard @ 00:57

BMI (body mass index) is often used a proxy for fat percent or similar measures. This is for a good reason:

BMI et al

So, the mean correlation across age groups and gender is very high, around .77 (unweighted mean across genders). There is a clear age gradient such that the correlation being higher at younger ages, which is opposite of what the body builder-confound would predict (few >80’s are body builders). But it does work slightly better for women (.78 vs. .75), perhaps because there are more male body builders.

BMI has a proven track record of predictive power of many health conditions, yet it still receives lots of criticism due to the fact that it gives misleading results for some groups, notably body builders. There is a conceptual link here with the criticism of simple IQ tests, such as Raven’s which ‘only measure ability to spot figures’. Nonverbal matrix tests such as Raven’s or Cattell’s do indeed not measure g as well as more diverse batteries do (Johnson et al). These visual tests could be similarly criticized for not working well on those with bad eyesight. However, they are still useful for a broad sample of the population.

Criticisms like this strike me as an incarnation of the perfect solution/Nirvana fallacy:

The perfect solution fallacy (aka the nirvana fallacy) is a fallacy of assumption: if an action is not a perfect solution to a problem, it is not worth taking. Stated baldly, the assumption is obviously false. The fallacy is usually stated more subtly, however. For example, arguers against specific vaccines, such as the flu vaccine, or vaccines in general often emphasize the imperfect nature of vaccines as a good reason for not getting vaccinated: vaccines aren’t 100% effective or 100% safe. Vaccines are safe and effective; however, they are not 100% safe and effective. It is true that getting vaccinated is not a 100% guarantee against a disease, but it is not valid to infer from that fact that nobody should get vaccinated until every vaccine everywhere prevents anybody anywhere from getting any disease the vaccines are designed to protect us from without harming anyone anywhere.

Any measure that has more than 0 validity can be useful in the right circumstances. If a measure has some validity and is easy to administer (BMI or non-verbal pen and paper group tests), they can be very useful even if they have less validity than better measures (fat% test or full battery IQ tests).

Anyway, BMI should probably/perhaps retired now because we have found a more effective measure (Ashwell et al):

Our aim was to differentiate the screening potential of waist-to-height ratio (WHtR) and waist circumference (WC) for adult cardiometabolic risk in people of different nationalities and to compare both with body mass index (BMI). We undertook a systematic review and meta-analysis of studies that used receiver operating characteristics (ROC) curves for assessing the discriminatory power of anthropometric indices in distinguishing adults with hypertension, type-2 diabetes, dyslipidaemia, metabolic syndrome and general cardiovascular outcomes (CVD). Thirty one papers met the inclusion criteria. Using data on all outcomes, averaged within study group, WHtR had significantly greater discriminatory power compared with BMI. Compared with BMI, WC improved discrimination of adverse outcomes by 3% (P < 0.05) and WHtR improved discrimination by 4–5% over BMI (P < 0.01). Most importantly, statistical analysis of the within-study difference in AUC showed WHtR to be significantly better than WC for diabetes, hypertension, CVD and all outcomes (P < 0.005) in men and women.
For the first time, robust statistical evidence from studies involving more than 300 000 adults in several ethnic groups, shows the superiority of WHtR over WC and BMI for detecting cardiometabolic risk factors in both sexes. Waist-to-height ratio should therefore be considered as a screening tool. (Ashwell et al, 2012)

It may even be that some of these measures are better predictors than body fat%. I didn’t find such a study.


Ashwell, M., Gunn, P., & Gibson, S. (2012). Waist‐to‐height ratio is a better screening tool than waist circumference and BMI for adult cardiometabolic risk factors: systematic review and meta‐analysis. obesity reviews, 13(3), 275-286.

Johnson, W., Nijenhuis, J. T., & Bouchard Jr, T. J. (2008). Still just 1 g: Consistent results from five test batteries. Intelligence, 36(1), 81-95.

August 11, 2012

Some quotes from Giving Debiasing Away: Can Psychological Research on Correcting Cognitive Errors Promote Human Welfare? (Scott O. Lilienfeld, Rachel Ammirati, and Kristin Landfield)

Filed under: Critical thinking / meta-thinking,Psychology — Emil O. W. Kirkegaard @ 21:08

Some quotes from Giving Debiasing Away: Can Psychological Research on Correcting Cognitive Errors Promote Human Welfare? by Scott O. Lilienfeld, Rachel Ammirati, and Kristin Landfield.

I was happy to learn that Lilienfeld is part of the neorational movement! I have already read one of his books, 50 Great Myths of Popular Psychology: Shattering Widespread Misconceptions about Human Behavior, and plan on reading another of them: Science and pseudoscience in clinical psychology.


ABSTRACT—Despite Miller’s (1969) now-famous clarion

call to ‘‘give psychology away’’ to the general public, sci-

entific psychology has done relatively little to combat fes-

tering problems of ideological extremism and both inter-

and intragroup conflict. After proposing that ideological

extremism is a significant contributor to world conflict and

that confirmation bias and several related biases are sig-

nificant contributors to ideological extremism, we raise a

crucial scientific question: Can debiasing the general

public against such biases promote human welfare by

tempering ideological extremism? We review the knowns

and unknowns of debiasing techniques against confirma-

tion bias, examine potential barriers to their real-world

efficacy, and delineate future directions for research on

debiasing. We argue that research on combating extreme

confirmation bias should be among psychological science’s

most pressing priorities.

Second, the term bias blind spot (Pronin, Gilovich, & Ross,

2004), more informally called the ‘‘not me fallacy’’ (Felson,

2002), refers to the belief that others are biased but that we are

not. Research shows that people readily recognize confirmation

bias and related biases in others, but not in themselves (Pronin

et al., 2004). The bias blind spot, which we can think of as a

‘‘meta-bias,’’ leads us to believe that only others, not ourselves,

interpret evidence in a distorted fashion.

Second, many individuals may be unreceptive to debiasing

efforts because they do not perceive these efforts as relevant to

their personal welfare. Research suggests that at least some

cognitive biases may be reduced by enhancing participants’

motivation to examine evidence thoughtfully (e.g., by increasing

their accountability to others), thereby promoting less per-

functory processing of information (Arkes, 1991; Tetlock&Kim,

1987). Therefore, some debiasing efforts may succeed only if

participants can be persuaded that their biases result in poor

decisions of real-world consequence to them.


Surely this is correct.

Fifth, researchers must be cognizant of the possibility that

efforts to combat confirmation bias may occasionally backfire

(Wilson, Centerbar, & Brekke, 2002). Researchers have ob-

served a backfire effect in the literature on hindsight bias

(Sanna, Schwarz, & Stocker, 2002), in which asking participants

to generate many alternative outcomes for an event paradoxi-

cally increases their certainty that the original outcome was

inevitable. This effect may arise because participants asked to

think of numerous alternative outcomes find doing so difficult,

leading them (by means of the availability heuristic; Tversky &

Kahneman, 1973) to conclude that there weren’t so many al-

ternative outcomes after all. Whether similar backfire effects

could result from efforts to debias participants against confir-

mation bias by encouraging them to consider alternative view-

points is unclear. Moreover, because research on attitude

inoculation (McGuire, 1962) suggests that exposure to weak

versions of arguments may actually immunize people against

these arguments, exposing people to alternative positions may

be effective only to the extent that these arguments are pre-

sented persuasively.


June 3, 2012

Something about rationality

Filed under: Critical thinking / meta-thinking,Psychology — Emil O. W. Kirkegaard @ 13:41

I recently stumbled upon this profile on OKCupid (her profile, my profile). She is obviously a very bright person and well-read as well. So prominent that i shud have heard of her, given that she has nearly identical interests to me. So, i used my Google-fu and tried “vulcan straw rational talk” and instantly found her. Turns out i had already seen (a bit of) one of her talks. No wonder she seemed familiar. Here is one of her talks, which is quite good. Not too much i learned from it but that’s just becus i have already read a lot about cognitive biases, system 1+2, rationalism @, etc.

Apparently, she has done many videos.

See also:

This reminds me that some years ago i toyed with the idea of making some videos of me explaining things (i never did it). Yes, this lessens the information density, but it also increases the ease of spreading the information. It also has the added benefit of training me for public speaking which i will probably engage in later anyway. It also trains me for semi-public speaking, like giving a lecture for a class or teaching at a school.

Too bad she lives in the US. She is totally hot, cute and has a very nice personality. Obviously, a chick like this will get a lot of messages, and with me also not living remotely close to her, i figure it isn’t worth the time to write a decent message.

April 25, 2012

Re. Thinking in foreign language makes decisions more rational (Ars Technica)

Filed under: Critical thinking / meta-thinking,Psychology — Emil O. W. Kirkegaard @ 03:14

This surely sounds like one of those dubious psychology experiments. U know the type, the type with a single or perhaps two small experiments that were slightly below p<0.05 and therefore publishable. So, i decided to take a look.

The Foreign-Language Effect Thinking in a Foreign Tongue Reduces Decision Biases

In general, there is not much to note about the study, except that technically speaking in a small detail, their study actually shows the opposite of what they think. The reason for this is that they gave the percentages as 33.3% and 66.6% instead of the correct 66.7%. Technically, this wud make the secure option always the best one by a small margin.

Anyway, i’d like to see some other people reproduce this effect in a larger sample size. I’m not keen on these 40-200 sample sizes in psychology.

The implications are interesting. So, suppose we now know that one makes more rational decisions (at least with regards to one cognitive bias) when thinking in a foreign language, what do we use this knowledge to? It is perhaps the most intresting reason to learn a foreign language i’ve heard of so far. Promoting rational thinking thru.. learning a new language. Strange, but it seems legit. Anyway, one cud force people to consider financial decisions in a foreign language, such as when taking a loan. I’d like to see some more research about this effect with regards to other biases.

April 8, 2012

On being right and avoiding being wrong

Filed under: Critical thinking / meta-thinking,Epistemology,Psychology — Tags: , , , — Emil O. W. Kirkegaard @ 01:19

The two goals: 1) having true beliefs, 2) not having false beliefs may seem to be equivalent or something like that, but they in fact result in different optimal strats.

Not having false beliefs

If one has only goal (2), the optimal strat is to believe as few things as possible, suspending belief about most things. Even if one meticulously checks the evidence to avoid being wrong, one will make mistakes once in a while perhaps simply because the current best available evidence about the subject is misleading (i.e. indicates that a particular thing is true that is actually false). So, the optimal strat is to believe nothing at all, but it is hardly possible to live like that. Unfortunately acquiring beliefs is more or less automatic in the sense that if one is exposed to evidence one will automatically form the relevant beliefs without making a choice about it.1 So, one needs to avoid exposing oneself to any evidence relevant to things that one does not already believe. Perhaps spending one’s time meditating is the optimal strat here.

Having true beliefs

If one has only goal (1), the optimal strat is to acquire lots of information and believe all sorts of things about it. I’m not entirely sure about the exact optimal strat. Perhaps the optimal strat is to set the evidence requirement for a belief as low as possible becus this makes it possible to form lots of beliefs, even on bad evidence. However, even this takes some time and since one wants to maximize the number of true beliefs, not just any beliefs, one shud probably gather at least some evidence. However, gathering evidence takes time and effort which cud be spend gathering evidence about some other subjects and forming beliefs about them as a result. So, some equilibrium will emerge about the optimal evidence requirement. Gathering evidence about a particular subject is a diminishing returns strat for having true beliefs.

The kind of thing that one shud gather evidence about matters. It is best to stick to areas where there is consensus about the experts and so that one can simply appeal to their authority and be right (most of the time). For subjects where there is much disagreement among experts, one wud need to gather some better evidence oneself to find out what the truth is. This probably means that one shud employ the heuristics mentioned in the previous link and stay away from subjects that fail at one or more of them. So, basically, one shud avoid all the subjects that i like. :P

However, as the number of one’s beliefs keep rising, one will run out of consensus subjects to study, in which case one will need to study non-consensus subjects and form beliefs about them too, even the ‘worst’ and less socially acceptable/politically correct subjects. Perhaps one shud keep these beliefs to oneself.

It is also a good idea to find other people with the same goal so that one can share evidence about things with them to speed up the progress.

Both having true beliefs and avoiding having false beliefs?

What about someone who has both goals and perhaps with different importance ratings? Here it is more difficult to give advice about the optimal strat. Perhaps one shud set the evidence requirement pretty high, but not impossibly high so that no amount of evidence is enuf. This shud take care of most of the false beliefs. However, there are more ways to get rid of false beliefs. There are many debunking sites around and lists of common misconceptions about all kinds of stuff. One shud read those to get rid of pesky beliefs that got thru the ‘evidence defense’, perhaps at some time before one was a critical thinker. Beliefs tend to stick around. Here are some good things to read to get rid of misconceptions about various things:

There are plenty more of such skepticism material around which brings me to the next point: One shud study critical thinking and logic with a focus on fallacies so that one can avoid making them and thus acquiring wrong beliefs. It is also a good idea to know about cognitive biases so that one can try to compensate for them. A strong command of mathematics, especially statistics, is also a good idea. This helps assess much of the science as most of that employ statistics now a days.

Then, one shud spend alot of time gathering good quality evidence about subjects. Since it takes time to read stuff, one shud read the highest quality stuff in that it offers the best evidence about the particular subject. This probably means reading science in journals and textbooks to begin with to learn the things about the subject that there is consensus about.

Other relevant literature


1As a side note, this is why pragmatic arguments for the belief in something are not very useful. One cannot just will oneself to start believing in something that one thinks that there is no evidence for, e.g. the xtian god. See

January 16, 2012

Graet reeding about herisy, open-miendednes, eksentrik thinking

Filed under: Critical thinking / meta-thinking,Metaphilosophy — Emil O. W. Kirkegaard @ 02:59

I found dhe esae from reeding anodher tekst (very komon praktis):

Dhe abov is aktualy orlsoe worth reeding. It is a book revuew of a book with very unpopular ideas, but shuurly som good peeses among dhem. Heer is dhe opening:

“SEATTLE, Washington – Now why would I be reviewing this book, which was published by the Neo-Nazi National Alliance and that I acquired via some shady Internet bookseller? After all, I am hardly the Aryan prototype.

For one thing, I’ve followed the Alliance for years, through some combination of curiosity, pity, and entertainment. I was listening to Dr. Pierce’s Internet broadcasts since the late 1990’s, what we now call “podcasts.” Pierce died in 2002 and the Alliance has undergone the usual tumult, power plays, and backbiting that occur whenever a personality-driven organization loses its leader.

Secondly, I have for years put effort into exposing myself to politically incorrect, unconventional, or unfashionable thought, as there is often some truth in there. According to Paul Graham, intelligent people tend to do this and it is overall a good thing.

Third, the author of Which Way Western Man?, William Gayley Simpson, looks like an agreeable chap.

So, I waded into the 1070-page treatise with full enthusiasm. The book consists of modified essays that Simpson originally wrote in the 1940s (and in most cases updated in the 1970s) and the writing has an erudite, early-20th-Century style to it. Simpson was born in 1892 and spent the 1910s and 1920s as a minister (he gave up the frock by 1918,) laborer, and general-purpose liberal, pacifistic Christian. Much of the book gives personal testament to his activities during these years and how they molded his worldview into what it was at the time of the writing(s). “

wich is waer i got dhe tekst i reely want to promoet from:

Dhe blog, bdw, is orlsoe worth a kloeser look. Dhe author seems to hav stoped poesting (last post 2010, August), but it has alot of poests to begin with.

March 5, 2011

Lesswrong: “The Neglected Virtue of Scholarship”

Filed under: Critical thinking / meta-thinking — Emil O. W. Kirkegaard @ 06:26

Another good advice.

March 4, 2011

Lesswrong: “Some Heuristics for Evaluating the Soundness of the Academic Mainstream in Unfamiliar Fields”

Filed under: Critical thinking / meta-thinking — Emil O. W. Kirkegaard @ 04:02

Worth reading.

Powered by WordPress