Clear Language, Clear Mind

October 21, 2018

Animal cross-fostering, race and IQ, and the deductivist’s fallacy

I wish to coin a fallacy I’ve seen a number of times, exemplified in this paper:


In this paper the nature of the reasoning processes applied to the nature-nurture question is discussed in general and with particular reference to mental and behavioral traits. The nature of data analysis and analysis of variance is discussed. Necessarily, the nature of causation is considered. The notion that mere data analysis can establish “real” causation is attacked. Logic of quantitative genetic theory is reviewed briefly. The idea that heritability is meaningful in the human mental and behavioral arena is attacked. The conclusion is that the heredity-IQ controversy has been a “tale full of sound and fury, signifying nothing”. To suppose that one can establish effects of an intervention process when it does not occur in the data is plainly ludicrous. Mere observational studies can easily lead to stupidities, and it is suggested that this has happened in the heredity-IQ arena. The idea that there are racial-genetic differences in mental abilities and behavioral traits of humans is, at best, no more than idle speculation.

While explaining the analysis of variance (ANOVA), he writes:

One can find a variety of presentations of analysis of variance which miss this elemental fact. In the heredity-IQ controversy, we see a statement by Feldman and Lewontin (1975) on the nature of analysis of variance that I believe to be totally wrong: “The analysis of variance is, in fact, what is known in mathematics as local perturbation analysis.” In fact, perturbation analysis is a well-defined mathematical-statistical area, and the intersection of that area with the basic idea of the analysis of variance is essentially negligible. They also say “A n analysis would require that we know the first partial derivatives of the unknown function f(G, E).” This illustrates a basic epistemological error. Suppose G and E were real variables and we did know the first partial derivatives; then, so what? They also say “The analysis of variance produces results that are applicable only to small perturbations around the current mean”; again a basic epistemological error. On the matter of the role of variance, to say that additive genetic variance is important “since Fisher’s fundamental theorem of natural selection predicts … ” is wide of the mark, and again exemplifies an error commonly made in population genetics. Fisher’s theorem, if it is correct, deals with fitness, whatever that is (and population geneticists are curiously silent on the matter, using a symbol such as s, and rarely, if ever, discussing the matter of its epistemic correlation to some observation protocol; see, for instance, Kempthorne and Pollak (1970)). It is necessary to my general thesis to bring these matters into the discourse, because understanding of what the analysis of variance does and what it does not do is absolutely critical in the heredity-IQ controversy. These criticisms must not be interpreted to suggest that all of the Feldman-Lewontin paper is suspect.

We now turn to what I regard as a shocking error of logic. Suppose:

so that 80% of the variability is associated with groups. It is all too easy to go from this to the totally erroneous statement that 80% of the variability is due to the factor of classification, where due to is interpreted as caused by. What has gone on in the IQ-heredity controversy is little more than this. It is obvious that this analysis of variance can tell us nothing about causation.

So, Kempthorne is basically just being wordy and saying you can’t just infer causation with mathematical certainty from association in ANOVA. But he goes further than that and says it can tell us nothing about causation. This is plainly false, association is evidence of causation, and depending on context and type of association, can be highly informative or not so informative. Probably a mere correlation in social science is not very indicative of causation (, but heritability-related statistics are. Since Kempthorne talks about using experiments to establish causation, we use that example against him.

Suppose we are interested in quantifying the degree to which between group means of two subspecies are due to genetic factors versus some other factors. In animals, we can attack this problem using e.g. cross-fostering experiments in which we basically kidnap very young offspring and place them with other parents, which might be from the other subspecies or not; or better yet, we implant embryos to take into account any uterine effects (see this earlier post for discussion of such for humans). Let’s say we are interested in behavior towards humans, e.g. fearfulness. If we do this experiment, we can work out which amount of variance in the trait is caused by rearing by particular parent, by same-group parent, by other-group parent, and by genetics. This kind of thing has been done a lot of times, and in recent times was used to estimate the between group heritability of human friendliness in the Siberian fox experiment. I quote from the book about it (How to Tame a Fox (and Build a Dog), Dugatkin and Trut, 2017):

WHAT LYUDMILA AND DMITRI WERE EQUIPPED TO investigate further was the other ways that innate traits and learning might be affecting their tame foxes. They were constantly availing themselves of the latest techniques for research, and during the time Lyudmila was living at Pushinka’s house, she and Dmitri decided to see whether they could delve even deeper into what degree the behaviors they were seeing in the tame foxes were genetically based.
Even as they tried to hold all conditions constant for the foxes, there were subtle, almost imperceptible differences that could creep into an experiment. For instance, what if the tamest mothers treated their pups differently than the aggressive moms treated their pups? Maybe pups learned something about how to be tame or aggressive toward humans from the way their moms treated them?

There was only one way to confirm for certain that the behavioral differences they were seeing between the tame and aggressive foxes were due to genetic differences. Dmitri and Lyudmila would have to try what is known as “cross-fostering.” They’d have to take developing embryos from tame mothers and transplant them into the wombs of aggressive females. Then they would let the aggressive foster mothers give birth and raise those pups. If the pups turned out tame themselves, despite having aggressive foster moms, then Lyudmila and Dmitri would know that tameness was fundamentally genetic and not learned. And, for completeness, they would also do the same experiment with the pups of aggressive mothers transplanted into tame mothers to see if they got parallel results.

In principle, cross-fostering was straightforward; researchers had used the procedure to examine the role of nature versus nurture for many years. But in practice it was easier said than done, it was technically difficult to pull off, and it had worked much better with some species than others. No one had ever tried to transplant fox embryos. Then again, no one had tried lots of things they had done, and so Lyudmila decided she would have to learn this delicate procedure on her own. She read all she could on transplant experiments that had been done in other species, and she conferred with the veterinarians they had on staff. Lives were at stake, so she took her time, learning everything she could.

She would be transplanting tiny, delicate embryos—on the order of eight days old—from the womb of one female into the womb of another pregnant female. Some of the embryos from tame mothers would be transplanted into the wombs of aggressive mothers, and some of those of aggressive mothers would be transplanted into the wombs of tame mothers. When the pups were born seven weeks later, she would closely observe their behavior to see if the pups of tame mothers became aggressive and if the pups of aggressive mothers became tame. But how in heaven’s name was she going to know which pups in a litter were the genetic offspring of the mother and which pups were the ones she had transplanted? Without that information, the experiment was futile. She realized that the foxes had their own unique color coding system. Coat color is a genetic trait, so if she carefully selected the males and females so that the coat coloring of their offspring would be predictable, and the pups of the aggressive mothers would have different colors from those of the tame mothers, she’d be able to tell which pups were the genetic offspring of a female, and which had been transplanted.

Lyudmila led the transplant surgeries with her faithful assistant Tamara by her side. Each surgery involved two females, one tame and one aggressive, each about a week into pregnancy. After lightly anesthetizing the foxes, Lyudmila made a tiny surgical incision in each female’s abdomen and located the uterus, with its right and left “horn,” each of which had embryos implanted in it. She then removed the embryos from one uterine horn and left the embryos in the other. Then she repeated the procedure with the second female. She transplanted the embryos that had been removed from one mother into the other in a drop of nutritional liquid that was placed into the tip of a pipette. “The embryos,” Lyudmila recalls with the pride of a job well done, “stayed outside the uterus [at room temperature from 64 to 68 degrees Fahrenheit] for no more than 5–6 minutes.” The females were then moved to a postoperative room and given time to recover.

Everyone at the Institute anxiously awaited the results. Even with the surgeries having gone so well, the transplanted embryos might not survive. Their wait paid off. It was the caretakers who were the first to discover the births of the first litters, which was often the case with new developments with the foxes. They sent word right away to the Institute. “It was like a miracle,” Lyudmila recorded. “All the workers gathered around the cages for a party with wine.”
Lyudmila and Tamara began recording the pups’ behavior as soon as they left their nests and began interacting with humans. One day Lyudmila watched as an aggressive female was parading around with her genetic and foster pups. “It was fascinating,” Lyudmila recalls, “. . . the aggressive mother had both tame and aggressive offspring. Her foster tame offspring were barely walking but they were already rushing to the cage doors, if there was a human standing by, and wagging their tails.” And Lyudmila wasn’t the only one fascinated. The mother foxes were as well. “The aggressive mothers were punishing tame pups for such improper behavior,” Lyudmila recalls. “They growled at them and grabbed their neck, throwing them back in the nest.” The genetic offspring of the aggressive mothers did not show curiosity about people. They, like their mothers, disliked humans. “The aggressive pups on the other hand retained their dignity,” Lyudmila remembers. “They growled aggressively, same as their mothers, and ran to their nests.” This pattern was repeated over and over. Pups behaved like their genetic mothers, not their foster mothers. There was no longer any doubt—basic tameness and aggression towards humans were, in part, genetic traits.
The house experiment with Pushinka had shown that tame foxes had also learned some of their behavior. Living with humans had taught the foxes additional ways of behaving, some of which they shared with their domesticated dog cousins. Genes surely played an important role, but the tame foxes were not simple genetic automatons; they learned to identify individual people and became particularly bonded to them, and even defended them, owing to the process of living with them. That these learned behaviors were so dog-like provided the tantalizing suggestion that wolves in the process of transforming into dogs might also have learned these behaviors by living with people. Dmitri and Lyudmila had produced some of the best evidence that an animal’s genetic lineage and the circumstances of its life combined in generating its behavior, and had done so in a highly innovative way.

What if you wanted to know about human race groups (populations, subspecies, etc. use your preferred term)? It would be quite cruel to do the above kind of experiments in a controlled fashion, but one can of course investigate adoptions, both within and between populations. For IQ, this has famously been done a few times see: MTAS and Tizard. I have also covered a number of recent studies that no one else seems to have paid attention to (see posts under this tag) until I posted them at which point a paper appeared citing them. Let’s disregard those for now. Suppose we carried out a large number of cross-fostering experiments in many species and for many traits, including cognitive, then we could calculate the summary statistics for these and see if there’s any relations to trait cluster, species type etc. Furthermore, we can relate them to the within group heritability-like statistics. Jensen (1998, p. 445) basically assumes a such relationship holds, though he doesn’t cite any animal research in support:

One of the aims of science is to comprehend as wide a range of phenomena as possible within a single framework, using the fewest possible mechanisms with the fewest assumptions and ad hoc hypotheses. With respect to IQ, the default hypothesis relating individual differences and population differences is consistent with this aim, as it encompasses the explanation of both within-group (WG) and between-group (BG) differences as having the same causal sources of variance. The default hypothesis that the BG and WG differences are homogeneous in their causal factors implies that a phenotypic difference of PD between two population groups in mean level of IQ results from the same causal effects as does any difference between individuals (within either of the two populations) whose IQs differ by PD (i.e., the phenotypic difference). In either case, PD is the joint result of both genetic (G) and environmental (E) effects. In terms of the default hypothesis, the effects of genotype X environment co­ variance are the same between populations as within populations. The same is hypothesized for genotype X environment interaction, although studies have found it contributes negligibly to within-population variance in g.

Actually, I would expect subspecies differences to be more heritable than within group differences because within group differences have a lot of environmental causal variance in nature, but less in humans because we reduce it by social policies. The average wild adult length of polar and grizzly bears differs by perhaps 50 cm for males (comparisons). But when we rear these bears in zoos, they keep these average differences, indicating that they are due to genetics not environmental habitat-related factors. As for experiments, many people have tried bringing up wolves to be like dogs. This often but not always fails and the wolves end up in wolf sanctuaries. It also often fails for wolf-dog hybrids. There’s actually a published study comparing wolves, poodles and wolf-poodle hybrids, apparently finding them to be quite wolfy in behavior (I couldn’t obtain a complete copy). Another study mentions that wolf-dog hybridization in the wild is quite common in Ethiopia, so one could actually run admixture studies here. I have not search this literature extensively, maybe something great exists waiting to be found by hereditarian-minded researchers.

To return to Kempthorne, the point is that if we can find such relationships between within group heritability-related statistics and between subspecies ones, this would imply that one can indeed derive relevant conclusions. Animal findings generalize to humans to some extent, though, perhaps not as much as we would hope. Given some level of generalizability (prior transfer, we might call it), we might be able to infer a likely range of between group heritability based on within group heritability in humans.

Per this, I shall coin the fallacy made by Kempthorne, which I shall term the deductivist’s fallacy. It is when a critic looks a relationship, finds that he cannot think of any strict formal i.e. necessary/logically necessary relationship between the premises and the conclusion, and then concludes that the premises can tell us nothing about the conclusion. This ignores the fact that the premises may have non-deductive/probabilistic relevance. To put it another way, the deductivist’s fallacy is when one criticizes an inductive argument for not being deductive enough. Many of the traditional fallacies in informal logic can be given a Bayesian reading this way and would no longer be fallacies. For instance, appeal to authority is surely relevant because statements about a topic from experts is more likely to be true than one from non-experts — unless it’s social science! There’s a few papers on this topic such as Korb 2004. Neven Sesardic also has a good discussion of the between/within group heritability reasoning in his excellent book Making sense of heritability.

Related meme

February 13, 2016

Causality, transitivity and correlation

Filed under: Logic,Math/Statistics,Philosophy,R — Tags: , , , — Emil O. W. Kirkegaard @ 12:09

Disclaimer: Some not too structured thoughts.

It’s commonly said that correlation does not imply causation. That is true (see Gwern’s analysis), but does causation imply correlation? Specifically, if “→” means causes and “~~” means correlates with, does X→Y imply X~~Y? It may seem obvious that the answer is yes, but it is not so clear.

Before going into that, consider transitivity. Wikipedia:

In mathematics, a binary relation R over a set X is transitive if whenever an element a is related to an element b, and b is in turn related to an element c, then a is also related to c.

Is causality transitive? It seems that the answer should be yes. If A causes B, and B causes C, then A causes C. With symbols:

  1. A→B
  2. B→C
  3. ⊢ A→C

(the ⊢ symbol means therefore). If causality is transitive, and causality implies correlation, then we may guess that transitivity holds for correlation too. Does it? Sort of.

The transitivity of correlations

We might more precisely say that it has partial transitivity. If A~~B at 1.0, and B~~C at 1.0, then A~~C at 1.0. However, for correlations ≠ |1|, then it doesn’t hold exactly: If A~~B at 0.7, and B~~C at 0.7, does not imply that A~~C at 0.7 or at 0.72 for that matter (this is the predicted path using path model tracing rules). Instead, there is a range of possible values with 0.72 being the most likely. As usual, the Jensen gives the answer, once or multiple places in his numerous writings. The ranges are given in Jensen (1980, p. 302; Bias in Mental Testing):

[discussing of types of validity] Concurrent validity rests on the soundness of the inference that, since the first test correlates highly with the second test and the second test correlates with the criterion, the first test is also correlated with the criterion. It is essentially this question: If we know to what extent A is correlated with B, and we know to what extent B is correlated with C, how precisely can we infer to what extent A is correlated with C? The degree of risk in this inference can be best understood in terms of the range within which the actual criterion validity coefficient would fall when a new test is validated in terms of its correlation with a validated test. Call the scores on the unvalidated test U, scores on the validated test V, and measures on the criterion C. Then rVC, the correlation between V and C, is the criterion validity of test V; and rUV, the correlation between U and V, is the concurrent validity of test U. The crucial question, then, is what precisely can we infer concerning rUC , that is, the probable criterion validity of test U?

If we know rVC and rUV, the upper and lower limits of the possible range of values of rUC are given by the following formulas: [combined to one]

(I rewrote this using rxy, ryz and rxz instead)

It may come as a sad surprise to many to see how very wide is the range of possible values of rUC for any given combination of values of rVC and rUV. The ranges of rUC are shown in Table 8.1, from which it is clear that concurrent validity inspires confidence only when the two tests are very highly correlated and the one test has a quite high criterion validity. Because it is rare to find criterion validities much higher than about .50, one can easily see the risk in depending on coefficients of concurrent validity. The risk is greatly lessened, however, when the two tests are parallel forms or one is a shortened form of the other, because both tests will then have approximately the same factor composition, which means that all the abilities measured by the first test that are correlated with the criterion also exist in the second test. The two tests should thus have fairly comparable correlations with the criterion, which is a necessary inference to justify concurrent validity.


So, the next time you see someone arguing thru (multiples) steps of transitivity for correlations, beware! Given A~~B at 0.7 and B~~C at 0.7, is still possible that A~~C is 0.0!

Visually, one can think of it in terms of variance explained and overlapping circles. If the A and B circles overlap 50% (which is r≈.71) and the same for B and C, then the question of what A~~C is depends on which whether the overlap between A and B is also the area that overlaps B and C. Because the areas are about 50% each (0.72 = 0.49), both complete overlap (r = 1) and no overlap with a slight remainder are possible (slight negative correlation).

Probably, because this argument often comes up, I should make a visualization.

Back to causation

Now, how does transitivity work for causation? It turns out that it depends on the exact concept we are using. For instance, suppose that A causes higher C and C causes higher Y. Now, we would probably say that A causes higher Y. However, suppose that A also causes higher D and D causes lower Y. Does A cause higher or lower Y? We might say that it depends on the strength of the causal paths. In this way, we are talking about A’s net (main) effect on Y which may be positive, negative or null depending on the strengths of the other causal paths. One might also take the view that A both causes higher and lower Y. This view is especially tempting when the causal paths are differentiated for individuals. Suppose for half the population, A causes higher C and C causes higher Y, and for the other half A causes higher D and D causes lower Y. One might instead say that A’s effect is an interaction with whatever differentiates the two halves of the population (e.g. gender).

Update 27th January 2018

Thanks to Ron for pointing out the error in the original equation I wrote. It was a mistake on my part. I also recalculated Jensen’s table in R, given below.

February 9, 2016

Stereotypes are relevant for judgments about individuals even when one has individualized information

Filed under: Logic,Math/Statistics — Tags: — Emil O. W. Kirkegaard @ 10:17

Title says it all. Apparently, some think this is not the case, but it is a straightforward application of Bayes’ theorem. When I first learned of Bayes’ theorem years ago, I thought of this point. Back then I also believed that stereotypes are irrelevant when one has individualized information. Alas, it is incorrect. Neven Sesardic in his excellent and highly recommended book Making Sense of Heritability (2005; download) explained it very clearly, so I will quote his account in full:

A standard reaction to the suggestion that there might be psychological differences between groups is to exclaim “So what?” Whatever these differences, whatever their origin, people should still be treated as individuals, and this is the end of the matter.

There are several problems with this reasoning. First of all, group membership is often a part of an individual’s identity. Therefore, it may not be easy for individuals to accept the fact of a group difference if it does not reflect well on their group. Of course, whichever characteristic we take, there will usually be much overlap, the difference will be only statistical (between group averages), any group will have many individuals that outscore most members of other groups, yet individuals belonging to the lowest-scoring group may find it difficult to live with this fact. It is not likely that the situation will become tolerable even if it is shown that it is not product of social injustice. As Nathan Glazer said: “But how can a group accept an inferior place in society, even if good reasons for it are put forth? It cannot” (Glazer 1994: 16). In addition, to the extent that the difference turns out to be heritable there will be more reason to think that it will not go away so easily (see chapter 5). It will not be readily eliminable through social engineering. It will be modifiable in principle, but not locally modifiable (see section 5.3 for the explanation of these terms). All this could make it even more difficult to accept it.

Next, the statement that people should be treated as individuals is certainly a useful reminder that in many contexts direct knowledge about a particular person eclipses the informativeness of any additional statistical data, and often makes the collection of this kind of data pointless. The statement is fine as far as it goes, but it should not be pushed too far. If it is understood as saying that it is a fallacy to use the information about an individual’s group membership to infer something about that individual, the statement is simply wrong. Exactly the opposite is true: it is a fallacy not to take this information into account.

Suppose we are interested in whether John has characteristic F. Evidence E (directly relevant for the question at hand) indicates that the probability of John having F is p. But suppose we also happen to know that John is a member of group G. Now elementary probability theory tells us that if we want to get the best estimate of the probability that John has F we have to bring the group information to bear on the issue. In calculating the desired probability we have to take into account (a) that John is a member of G, and (b) what proportion of G has F. Neglecting these two pieces of information would mean discarding potentially relevant information. (It would amount to violating what Carnap called “the requirement of total evidence.”) It may well happen that in the light of this additional information we would be forced to revise our estimate of probability from p to p∗. Disregarding group membership is at the core of the so-called “base rate fallacy,” which I will describe using Tversky and Kahneman’s taxicab scenario (Tversky & Kahneman 1980).

In a small city, in which there are 90 green taxis and 10 blue taxis, there was a hit-and-run accident involving a taxi. There is also an eyewitness who told the police that the taxi was blue. The witness’s reliability is 0.8, which means that, when he was tested for his ability to recognize the color of the car under the circumstances similar to those at the accident scene, his statements were correct 80 percent of the time. To reduce verbiage, let me introduce some abbreviations: B = the taxi was blue; G = the taxi was green; WB = witness said that the taxi was blue.

What we know about the whole situation is the following:

(1) p(B) = 0.1 (the prior probability of B, before the witness’s statement is taken into account)

(2) p(G) = 0.9 (the prior probability of G)

(3) p(WB/B) = 0.8 (the reliability of the witness, or the probability of WB, given B)

(4) p(WB/G) = 0.2 (the probability of WB, given G)

Now, given all this information, what is the probability that the taxi was blue in that particular situation? Basically we want to find p(B/WB), the posterior probability of B, i.e., the probability of B after WB is taken into account. People often conclude, wrongly, that this probability is 0.8. They fail to take into consideration that the proportion of blue taxis is pretty low (10 percent), and that the true probability must reflect that fact. A simple rule of elementary probability, Bayes’ theorem, gives the formula to be applied here:

p(B/WB) = p(B) × p(WB/B) / [p(B) × p(WB/B) + p(G) × p(WB/G)].

Therefore, the correct value for p(B/WB) is 0.31, which shows that the usual guess (0.8 or close to it) is wide of the mark.

It is easier to understand that 0.31 is the correct answer by looking at Figure 6.1. Imagine that the situation with the accident and the witness repeats itself 100 times. Obviously, we can expect that the taxi involved in the accident will be blue in 10 cases (10 percent), while in the remaining 90 cases it will be green. Now consider these two different kinds of cases separately. In the top section (blue taxis), the witness recognizes the true color of the car 80 percent of the times, which means in 8 out of 10 cases. In the bottom section (green taxis), he again recognizes the true color of the car 80 percent of the times, which here means in 72 out of 90 cases. Now count all those cases where the witness declares that the taxi is blue, and see how often he is right about it. Then simply divide the number of times he is right when he says “blue” with the overall number of times he says “blue,” and this will immediately give you p(B/WB). The witness gives the answer “blue” 8 times in the upper section (when the taxi is indeed blue), and 18 times in the bottom section (when the taxi is actually green). Therefore, our probability is: 8/(8 + 18) = 0.31.

It may all seem puzzling. How can it be that the witness says the taxi is blue, his reliability as a witness is 0.8, and yet the probability that the taxi is blue is only 0.31? Actually there is nothing wrong with the reasoning. It is the lower prior frequency of blue taxis that brings down the probability of the taxi being blue, and that is that. Bayes’ theorem is a mathematical truth. Its application in this kind of situation is beyond dispute. Any remaining doubt will be dispelled by inspecting Figure 6.1 and seeing that if you trust the witness when he says “blue” you will indeed be more often wrong than right. But notice that you have excellent reasons to trust the witness if he says “green” because in that case he will be right 97 percent of the time! It all follows from the difference in prior probabilities for “blue” and “green.” There is a consensus that neglecting prior probabilities (or base rates) is a logical fallacy.

But if neglecting prior probabilities is a fallacy in the taxicab example, then it cannot stop being a fallacy in other contexts. Oddly enough, many people’s judgment actually changes with context, particularly when it comes to inferences involving social groups. The same move of neglecting base rates that was previously condemned as the violation of elementary probability rules is now praised as reasonable, whereas applying the Bayes’ theorem (previously recommended) is now criticized as a sign of irrationality, prejudice and bigotry.

A good example is racial or ethnic profiling,30 the practice that is almost universally denounced as ill advised, silly, and serving no useful purpose. This is surprising because the inference underlying this practice has the same logical structure as the taxicab situation. Let me try to show this by representing it in the same format as Figure 6.1. But first I will present an example with some imagined data to prepare the ground for the probability question and for the discussion of group profiling.

Suppose that there is a suspicious characteristic E such that 2 percent terrorists (T) have E but only 0.002 percent non-terrorists (−T) have E. This already gives us two probabilities: p(E/T) = 0.02; p(E/−T) = 0.00002. How useful is E for recognizing terrorists? How likely is it that someone is T if he has E? What is p(T/E)? Bayes’ theorem tells us that the answer depends on the percentage of terrorists in a population. (Clearly, if everybody is a terrorist, then p(T/E) = 1; if no one is a terrorist, then p(T/E) = 0; if some people are T and some −T, then 1 > p(T/E) > 0.) To activate the group question, suppose that there are two groups, A and B, that have different percentages of terrorists (1 in 100, and 1 in 10,000, respectively). This translates into different probabilities of an arbitrary member of a group being a terrorist. In group A, p(T) = 0.01 but in group B, p(T) = 0.0001. Now for the central question: what will p(T/E) be in A and in B? Figures 6.2a and 6.2b provide the answer.



In group A, the probability of a person with characteristic E being a terrorist is 0.91. In group B, this probability is 0.09 (more than ten times lower). The group membership matters, and it matters a lot.

Test your intuitions with a thought experiment: in an airport, you see a person belonging to group A and another person from group B. Both have suspicious trait E but they go in opposite directions. Whom will you follow and perhaps report to the police? Will you (a) go by probabilities and focus on A (committing the sin of racial or ethnic profiling), or (b) follow political correctness and flip a coin (and feel good about it)? It would be wrong to protest here and refuse to focus on A by pointing out that most As are not terrorists. This is true but irrelevant. Most As that have E are terrorists (91 percent of them, to be precise), and this is what counts. Compare that with the other group, where out of all Bs that have E, less than 10 percent are terrorists.

To recapitulate, since the two situations (the taxicab example and the social groups example) are similar in all relevant aspects, consistency requires the same answer. But the resolute answer is already given in the first situation. All competent people speak with one voice here, and agree that in this kind of situation the witness’s statement is only part of the relevant evidence. The proportion of blue cars must also be taken into account to get the correct probability that the taxi involved in the accident was blue. Therefore, there is no choice but to draw the corresponding conclusion in the second case. E is only part of the relevant evidence. The proportion of terrorists in group A (or B) must also be taken into account to get the correct probability that an individual from group A (or B) is a terrorist.

The “must” here is a conditional “must,” not a categorical imperative. That is, you must take into account prior probabilities if you want to know the true posterior probability. But sometimes there may be other considerations, besides the aim to know the true probability. For instance, it may be thought unfair or morally unacceptable to treat members of group A differently from members of group B. After all, As belong to their ethnic group without any decision on their part, and it could be argued that it is unjust to treat every A as more suspect just because a very small proportion of terrorists among As happens to be higher than an even lower proportion of terrorists among Bs. Why should some people be inconvenienced and treated worse than others only because they share a group characteristic, which they did not choose, which they cannot change, and which is in itself morally irrelevant?

I recognize the force of this question. It pulls in the opposite direction from Bayes’ theorem, urging us not to take into account prior probabilities. The question which of the two reasons (the Bayesian or the moral one) should prevail is very complex, and there is no doubt that the answer varies widely, depending on the specific circumstances and also on the answerer. I will not enter that debate at all because it would take us too far away from our subject.

The point to remember is that when many people say that “an individual can’t be judged by his group mean” (Gould 1977: 247), that “as individuals we are all unique and population statistics do not apply” (Venter 2000), that “a person should not be judged as a member of a group but as an individual” (Herrnstein & Murray 1994: 550), these statements sound nice and are likely to be well received but they conflict with the hard fact that a group membership sometimes does matter. If scholars wear their scientific hats when denying or disregarding this fact, I am afraid that rather than convincing the public they will more probably damage the credibility of science.

It is of course an empirical question how often and how much the group information is relevant for judgments about individuals in particular situations, but before we address this complicated issue in specific cases, we should first get rid of the wrong but popular idea that taking group membership into consideration (when thinking about individuals) is in itself irrational or morally condemnable, or both. On the contrary, in certain decisions about individuals, people “would have to be either saints or idiots not to be influenced by the collective statistics” (Genovese 1995: 333).


Lee Jussim (politically incorrect social psychologist; blog) in his interesting book Social perception and social reality (2012; download), notes the same fact. In fact, he spends an entire chapter on the question of how people integrate stereotypes with individualized information and whether this increases accuracy. He begins:

Stereotypes and Person Perception: How Should People Judge Individuals?
“Should” might mean many things. It might mean, “What would be the most moral thing to do?” Or, “What would be the legal thing to do, or the most socially acceptable thing to do, or the least off ensive thing to do?” I do not use it here, however, to mean any of these things. Instead, I use the term “should” here to mean “what would lead people to be most accurate?” It is possible that being as accurate as possible would be considered by some people to be immoral or even illegal (see Chapters 10 and 15). Indeed, a wonderful turn of phrase, “forbidden base-rates,” was coined (Tetlock, 2002 ) to capture the very idea that, sometimes, many people would be outraged by the use of general information about groups to reach judgments that would be as accurate as possible (a “base-rate” is the overall prevalence of some characteristic in a group, usually presented as a percentage; e.g., “0.7 % of Americans are in prison” is a base-rate reflecting Americans’ likelihood of being in prison). The focus in this chapter is exclusively on accuracy and not on morality or legality.

Philip Tetlock (famous for his forecasting tournaments) in the quoted article above, writes:

The SVPM [The sacred value-protection model] maintains that categorical proscriptions on cognition can also be triggered by blocking the implementation of relational schemata in sensitive domains. For example, forbidden base rates can be defined as any statistical generalization that devout Bayesians would not hesitate to insert into their likelihood computations but that deeply offends a moral community. In late 20th-century America, egalitarian movements struggled to purge racial discrimination and its residual effects from society (Sniderman & Tetlock, 1986). This goal was justified in communal-sharing terms (“we all belong to the same national family”) and in equality-matching terms (“let’s rectify an inequitable relationship”). Either way, individual or corporate actors who use statistical generalizations (about crime, academic achievement, etc.) to justify disadvantaging already disadvantaged populations are less likely to be lauded as savvy intuitive statisticians than they are to be condemned for their moral insensitivity.

So this is not some exotic idea, it is recognized by several experts.

I don’t have any particular opinion regarding the morality of using involuntary group memberships in one’s assessments, but in terms of epistemic rationality (making correct judgments), the case is clear: one must take into account group memberships when making judgments about individuals. Of course, some individualized information may have a much stronger evidential value than the mere fact of group membership.

By the way, this fact means that studies that try to show bias of raters/employers based on e.g. sending fake resumes with members of two racial groups with equal features (e.g. educational attainment), do not demonstrate bias. This is even if there was no such thing as affirmative action. It is even true in some cases after selection for a fair criterion.

April 19, 2014

First and second-order logic formalizations

Filed under: Logic — Tags: , — Emil O. W. Kirkegaard @ 13:47

From researchgate:

What is the actual difference between 1st order and higher order logic?
Yes, I know. They say, the 2nd order logic is more expressive, but it is really hard to me to see why. If we have a domain X, why can’t we define the domain X’ = X u 2^X and for elements of x in X’ define predicates:
BELONGS_TO(x, y) – undefined (or false) when ELEMENT(y)
Now, we can express sentences about subsets of X in the 1st-order logic!
Similarly we can define FUNCTION(x), etc. and… we can express all 2nd-order sentences in the 1st order logic!
I’m obviously overlooking something, but what actually? Where have I made a mistake?

My answer:

In many cases one can reduce a higher order formalization to a first-order, but it will come at the price of complexity of the formalization.

For instance, formalize the follow argument in both first order and second order logic:
All things with personal properties are persons. Being kind is a personal property. Peter is kind. Therefore, Peter is a person.

One can do this with either first or second order, but it is easier in second-order.

First-order formalization:
1. (∀x)(PersonalProperty(x)→((∀y)(HasProperty(y,x)→Person(y)))
2. PersonalProperty(kind)
3. HasProperty(peter,kind)
⊢ 4. Person(peter)

Second-order formalization
1. (∀Φ)(PersonalProperty(Φ)→(∀x)(Φx→Person(x)))
2. PersonalProperty(IsKind)
3. IsKind(peter)
⊢ 4. Person(peter)

where Φ is a second-order variable. Basically, whenever one uses first order to formalize arguments like this, one has to use a predicate like “HasProperty(x,y)” so that one can treat variables as properties indirectly. This is unnecessary in second-order logics.

November 30, 2013

Introductory material to logic

Filed under: Logic — Emil O. W. Kirkegaard @ 15:54

Someone needed this, so I made a quick collection.


Introductory and more philosophical


Introductory not so philosophical/more formal


More advanced and very formal

October 2, 2013

Theory of mind and reasoning complexity (paper for some linguistics class)

Filed under: Linguistics/language,Logic — Emil O. W. Kirkegaard @ 16:29

The assignment was:

Any aspect? :D I just wrote stuff about formal logic. So no more research was needed. Lucky.

SMU paper 1

January 18, 2013

Kennethamy on voluntarianism about beliefs

Filed under: Logic,Multilogues — Emil O. W. Kirkegaard @ 07:55

From here. btw this thread was one of the many discussions that helped form my views about what wud later become the essay about begging the question, and the essay about how to define “deductive argument” and “inductive argument”.


 Do you know much about Jung’s theory of archetypes? If so, what do you make of it?


 I don’t make much of Jung. Except for the notions of introversion and extroversion. Not my cup of tea. As I said, we don’t create our own beliefs. We acquire them. Beliefs are not voluntary.


 They are to some extend but not as much as some people think (Pascal’s argument comes to mind).


 Yes, it does. And that is an issue. His argument does not show anything about this issue. He just assumes that belief is voluntary He does talk about how someone might acquire beliefs. He advises, for instance, that people start going to Mass, and practicing Catholic ritual. And says they will acquire Catholic beliefs that way. It sounds implausible to me. It is a little like the old joke about a well-known skeptic, who puts a horseshoe on his door for good luck. A friend of his sees the horseshoe and says, “But I thought you did not believe in that kind of thing”. To which the skeptic replied, “I don’t, but I hear that it works even if you don’t believe it”.


December 20, 2012

Exam paper on negations

Filed under: Language,Linguistics/language,Logic — Emil O. W. Kirkegaard @ 22:04

Exam paper for Danish and Languages of the world

December 11, 2012

Interesting paper: Logic and Reasoning: do the facts matter? (Johan van Benthem)

Filed under: Linguistics/language,Logic,Psychology — Tags: — Emil O. W. Kirkegaard @ 01:43

In my hard task to avoid actually doing my linguistics exam paper, ive been reading a lot of other stuff to keep my thoughts away from thinking about how i really ought to start writing my paper. In this case i am currently reading a book, Human Reasoning and Cognitive Science (Keith Stenning and Michiel van Lambalgen), and its pretty interesting. But in the book they authors mentioned another paper, and i like to loop up references in books. Its that paper that this post is about.

Logic and Reasoning do the facts matter free pdf download

Why is it interesting? first: its a mixture of som of my favorit fields, fields that can be difficult to synthesize. im talking about filosofy of logic, logic, linguistics, and psychology. they are all related to the fenomenon of human reasoning. heres the abstract:

Modern logic is undergoing a cognitive turn, side-stepping Frege’s ‘anti- psychologism’. Collaborations between logicians and colleagues in more empirical fields are growing, especially in research on reasoning and information update by intelligent agents. We place this border-crossing research in the context of long-standing contacts between logic and empirical facts, since pure normativity has never been a plausible stance. We also discuss what the fall of Frege’s Wall means for a new agenda of logic as a theory of rational agency, and what might then be a viable understanding of ‘psychologism’ as a friend rather than an enemy of logical theory.

its not super long at 15 pages, and definitly worth reading for anyone with an interest in the b4mentioned fields. in this post id like to model som of the scenarios mentioned in the paper.

To me, however, the most striking recent move toward greater realism is the wide range of information-transforming processes studied in modern logic, far beyond inference. As we know from practice, inference occurs intertwined with many other notions. In a recent ‘Kids’ Science Lecture’ on logic for children aged around 8, I gave the following variant of an example from Antiquity, to explain what modern logic is about:

You are in a restaurant with your parents, and you have ordered three dishes: Fish, Meat, and Vegetarian. Now a new waiter comes back from the kitchen with three dishes. What will happen?

The children say, quite correctly, that the waiter will ask a question,say: “Who has the Fish?”. Then, they say that he will ask “Who has the Meat?” Then, as you wait, the light starts shining in those little eyes, and a girl shouts: “Sir, now, he will not ask any more!” Indeed, two questions plus one inference are all that is needed. Now a classical logician would have nothing to say about the questions (they just ‘provide premises’), but go straight for the inference. In my view, this separation is unnatural, and logic owes us an account of both informational processes that work in tandem: the information flow in questions and answers, and the inferences that can be drawn at any stage. And that is just what modern so-called ‘dynamic- epistemic logics’ do! (See [32] and [30].) But actually, much more is involved in natural communication and argumentation. In order to get premises to get an inference going, we ask questions. To understand answers, we need to interpret what was said, and then incorporate that information. Thus, the logical system acquires a new task, in addition to providing valid inferences, viz. systematically keeping track of changing representations of information. And when we get information that contradicts our beliefs so far, we must revise those beliefs in some coherent fashion. And again, modern logic has a lot to say about all of this in the model theory of updates and belief changes.

i think it shud be possible to model this situation with help my from erotetic logic.

first off, somthing not explicitly mentioned but clearly true is that the goal for the waiter to find out who shud hav which dish. So, the waiter is asking himself these three questions:

Q1: ∃x(ordered(x,fish)∧x=?) – somone has ordered fish, and who is that?
Q2: ∃y(ordered(y,meat)∧y=?) – somone has ordered meat, and who is that?
Q3: ∃z(ordered(z,veg)∧z=?) – somone has ordered veg, and who is that?
(x, y, z ar in the domain of persons)

the waiter can make another, defeasible, assumption (premis), which is that x≠y≠z, that is, no person ordered two dishes.

also not stated explicitly is the fact that ther ar only 3 persons, the child who is asked to imagin the situation, and his 2 parents. these correspond to x, y, z, but the relations between them dont matter for this situation. and we dont know which is which, so we’ll introduce 3 particulars to refer to the three persons: a, b, c. lets say the a is the father, b the mother, c the child. also, a≠b≠c.

the waiter needs to find 3 correct answers to 3 questions. the order doesnt seem to matter – it might in practice, for practical reasons, like if the dishes ar partly on top of each other, in which case the topmost one needs to be served first. but since it doesnt in this situation, som arbitrary order of questions is used, in this case the order the fishes wer previusly mentioned in: fish, meat, veg. befor the waiter gets the correct answer to Q1, he can deduce that:

(follows from varius previusly mentioned premisses and with classical FOL with identity)

then, say that the answer gets the answer “me” from a (the father), then given that a, b, and c ar telling the truth, and given som facts about how indexicals work, he can deduce that a=x. so the waiter has acquired the first piece of information needed. befor proceeding to asking mor questions, the waiter then updates his beliefs by deduction. he can now conclude that:

(follows from varius previusly mentioned premisses and with classical FOL with identity)

since the waiter cant seem to infer his way to what he needs to know, which is the correct answers to Q2 and Q3, he then proceeds to ask another question. when he gets the answer, say that b (the mother) says “me”, he concludes like befor that z=b, and then hands the mother the veg dish.

then like befor, befor proceeding with mor questions, he tries to infer his way to the correct answer to Q3, and this time it is possible, hence he concludes that:

(follows from varius previusly mentioned premisses and with classical FOL with identity)

and then he needs not ask Q3 at all, but can just hand c (the child) the dish with meat.

Moreover, in doing so, it must account for another typical cognitive phenomenon in actual behavior, the interactive multi-agent character of the basic logical tasks. Again, the children at the Kids’ Lecture had no difficulty when we played the following scenario:

Three volunteers were called to the front, and received one coloured card each: red, white, blue. They could not see the others’ cards. When asked, all said they did not know the cards of the others. Then one girl (with the white card) was allowed a question; and asked the boy with the blue card if he had the red one. I then asked, before the answer was given, if they now knew the others’ cards, and the boy with the blue card raised his hand, to show he did. After he had answered “No” to his card question, I asked again who knew the cards, and now that same boy and the girl both raised their hands …

The explanation is a simple exercise in updating, assuming that the question reflected a genuine uncertainty. But it does involve reasoning about what others do and do not know. And the children did understand why one of them, the girl with the red card, still could not figure out everyone’s cards, even though she knew that they now knew.15

this one is mor tricky, this it involves beliefs of different ppl, the first situation didnt.

the questions ar:

Q1: ∃x(possess(x,red)∧x=?)
Q2: ∃y(possess(y,white)∧y=?)
Q3: ∃z(possess(z,blue)∧z=?)

again, som implicit facts:


and non-identicalness of the persons:

x≠y≠z, and a≠b≠c. a is the first girl, b is the boy, c is the second girl. ther ar no other persons. this allow the inference of the facts:


another implicit fact, namely that the children can see their own card and know which color it is:

∀x∀card(possess(x, card)→know(x, possess(x, card)) – for any person and for any colored card, if that person possesses the card, then that person knows that that person possesses the card.

the facts given in the description of who actually has which cards are:


so, given these facts, each person can now deduce which variable is identical to one of the constants, and so:


but non of the persons can seem to answer the other two questions, altho it is different questions they cant answer. for this reason, one person, a (first girl), is allowed to ask a question. she asks:

Q3: possess(b,red)? [towards b]

now, befor the answer is given, the researcher asks if anyone knows the answer to all the questions. b raises his hand. did he know? possibly. we need to add another assumption to see why. b (the boy) is assuming that a (the first girl) is asking a nondeceptiv question. she is trying to get som information out from b (the boy). this is not so if she asks about somthing she already knows. she might do that to deceive, but assuming that isnt the case, we can add:


in EN: for any two persons, and any card, if the first person is asking the second person about whether the second person possesses the card, then the first person does not possess the card. from this assumption of non-deception, the boy can infer:

¬possess(a, red)

and so he coms to know that:

know(b,¬possess(a, red))∧know(b, x≠a)

can the boy figure out the questions now? yes: becus he also knows:


from which he can infer that:

¬possess(b,red) – she asked about it, so she doesnt hav it herself
¬possess(b, blue) – he has the card himself, and only 1 person has the card

but recall that every person has a card, and he knows that b has neither the red or the blue, then he can infer that b has the white card. and then, since ther ar only 3 cards and 2 persons, and he knows the answers to the first two questions, ther is only one option left for the last person: she must hav the red card. hence, he can raise his hand.

the girl who asked the question, however, lacks the crucial information of which card the boy has befor he answers the question, so she cant infer anything mor, and hence doesnt raise her hand.

now, b (the boy) answers in the negativ. assuming non-deceptivness again (maxim of truth) but in another form, she can infer that:

¬possess(b, red)

and so also knows that:

¬possess(a, red)

hence, she can deduce that, the last person must hav the red card, hence:


from that, she can infer that the boy, b, has the last remaining card, the blue one. hence she has all the answers to Q1-3, and can raise her hand.

the second girl, however, still lacks crucial information to deduce what the others hav. the information made public so far doenst help her at all, since she already knew all along that she had the red card. no other information has been made available to her, so she cant tell whether a or b has the blue card, or the white card. hence, she doenst raise her hand.

all of this assumes that the children ar rather bright and do not fail to make relevant logical conclusions. probably, these 2 examples ar rather made up. but see also:

a similar but much harder problem. surely it is possible to make a computer that can figur this one out, i already formalized it befor. i didnt try to make an algorithm for it, but surely its possible. heres my formalizations.

types of reasoners, i assumed that they infered nothing wrong, and infered everything relevant. wikipedia has a list of common assumptions like this:

October 9, 2012

Some quick notes about a more expressive quantification logic

Filed under: Language,Logic — Tags: , — Emil O. W. Kirkegaard @ 06:43

Towardsabetterquantitativelogic (due to formatting)

Older Posts »

Powered by WordPress