Title says it all. Apparently, some think this is not the case, but it is a straightforward application of Bayes’ theorem. When I first learned of Bayes’ theorem years ago, I thought of this point. Back then I also believed that stereotypes are irrelevant when one has individualized information. Alas, it is incorrect. Neven Sesardic in his excellent and highly recommended book Making Sense of Heritability (2005; download) explained it very clearly, so I will quote his account in full:

A standard reaction to the suggestion that there might be psychological differences between groups is to exclaim “So what?” Whatever these differences, whatever their origin, people should still be treated as individuals, and this is the end of the matter.

There are several problems with this reasoning. First of all, group membership is often a part of an individual’s identity. Therefore, it may not be easy for individuals to accept the fact of a group difference if it does not reflect well on their group. Of course, whichever characteristic we take, there will usually be much overlap, the difference will be only statistical (between group averages), any group will have many individuals that outscore most members of other groups, yet individuals belonging to the lowest-scoring group may find it difficult to live with this fact. It is not likely that the situation will become tolerable even if it is shown that it is not product of social injustice. As Nathan Glazer said: “But how can a group accept an inferior place in society, even if good reasons for it are put forth? It cannot” (Glazer 1994: 16). In addition, to the extent that the difference turns out to be heritable there will be more reason to think that it will not go away so easily (see chapter 5). It will not be readily eliminable through social engineering. It will be modifiable in principle, but not locally modifiable (see section 5.3 for the explanation of these terms). All this could make it even more difficult to accept it.

Next, the statement that people should be treated as individuals is certainly a useful reminder that in many contexts direct knowledge about a particular person eclipses the informativeness of any additional statistical data, and often makes the collection of this kind of data pointless. The statement is fine as far as it goes, but it should not be pushed too far. If it is understood as saying that it is a fallacy to use the information about an individual’s group membership to infer something about that individual, the statement is simply wrong. Exactly the opposite is true: it is a fallacy not to take this information into account.

Suppose we are interested in whether John has characteristic F. Evidence E (directly relevant for the question at hand) indicates that the probability of John having F is p. But suppose we also happen to know that John is a member of group G. Now elementary probability theory tells us that if we want to get the best estimate of the probability that John has F we have to bring the group information to bear on the issue. In calculating the desired probability we have to take into account (a) that John is a member of G, and (b) what proportion of G has F. Neglecting these two pieces of information would mean discarding potentially relevant information. (It would amount to violating what Carnap called “the requirement of total evidence.”) It may well happen that in the light of this additional information we would be forced to revise our estimate of probability from p to p∗. Disregarding group membership is at the core of the so-called “base rate fallacy,” which I will describe using Tversky and Kahneman’s taxicab scenario (Tversky & Kahneman 1980).

In a small city, in which there are 90 green taxis and 10 blue taxis, there was a hit-and-run accident involving a taxi. There is also an eyewitness who told the police that the taxi was blue. The witness’s reliability is 0.8, which means that, when he was tested for his ability to recognize the color of the car under the circumstances similar to those at the accident scene, his statements were correct 80 percent of the time. To reduce verbiage, let me introduce some abbreviations: B = the taxi was blue; G = the taxi was green; WB = witness said that the taxi was blue.

What we know about the whole situation is the following:

(1) p(B) = 0.1 (the prior probability of B, before the witness’s statement is taken into account)

(2) p(G) = 0.9 (the prior probability of G)

(3) p(WB/B) = 0.8 (the reliability of the witness, or the probability of WB, given B)

(4) p(WB/G) = 0.2 (the probability of WB, given G)

Now, given all this information, what is the probability that the taxi was blue in that particular situation? Basically we want to find p(B/WB), the posterior probability of B, i.e., the probability of B after WB is taken into account. People often conclude, wrongly, that this probability is 0.8. They fail to take into consideration that the proportion of blue taxis is pretty low (10 percent), and that the true probability must reflect that fact. A simple rule of elementary probability, Bayes’ theorem, gives the formula to be applied here:

p(B/WB) = p(B) × p(WB/B) / [p(B) × p(WB/B) + p(G) × p(WB/G)].

Therefore, the correct value for p(B/WB) is 0.31, which shows that the usual guess (0.8 or close to it) is wide of the mark.

It is easier to understand that 0.31 is the correct answer by looking at Figure 6.1. Imagine that the situation with the accident and the witness repeats itself 100 times. Obviously, we can expect that the taxi involved in the accident will be blue in 10 cases (10 percent), while in the remaining 90 cases it will be green. Now consider these two different kinds of cases separately. In the top section (blue taxis), the witness recognizes the true color of the car 80 percent of the times, which means in 8 out of 10 cases. In the bottom section (green taxis), he again recognizes the true color of the car 80 percent of the times, which here means in 72 out of 90 cases. Now count all those cases where the witness declares that the taxi is blue, and see how often he is right about it. Then simply divide the number of times he is right when he says “blue” with the overall number of times he says “blue,” and this will immediately give you p(B/WB). The witness gives the answer “blue” 8 times in the upper section (when the taxi is indeed blue), and 18 times in the bottom section (when the taxi is actually green). Therefore, our probability is: 8/(8 + 18) = 0.31.

It may all seem puzzling. How can it be that the witness says the taxi is blue, his reliability as a witness is 0.8, and yet the probability that the taxi is blue is only 0.31? Actually there is nothing wrong with the reasoning. It is the lower prior frequency of blue taxis that brings down the probability of the taxi being blue, and that is that. Bayes’ theorem is a mathematical truth. Its application in this kind of situation is beyond dispute. Any remaining doubt will be dispelled by inspecting Figure 6.1 and seeing that if you trust the witness when he says “blue” you will indeed be more often wrong than right. But notice that you have excellent reasons to trust the witness if he says “green” because in that case he will be right 97 percent of the time! It all follows from the difference in prior probabilities for “blue” and “green.” There is a consensus that neglecting prior probabilities (or base rates) is a logical fallacy.

But if neglecting prior probabilities is a fallacy in the taxicab example, then it cannot stop being a fallacy in other contexts. Oddly enough, many people’s judgment actually changes with context, particularly when it comes to inferences involving social groups. The same move of neglecting base rates that was previously condemned as the violation of elementary probability rules is now praised as reasonable, whereas applying the Bayes’ theorem (previously recommended) is now criticized as a sign of irrationality, prejudice and bigotry.

A good example is racial or ethnic profiling,30 the practice that is almost universally denounced as ill advised, silly, and serving no useful purpose. This is surprising because the inference underlying this practice has the same logical structure as the taxicab situation. Let me try to show this by representing it in the same format as Figure 6.1. But first I will present an example with some imagined data to prepare the ground for the probability question and for the discussion of group profiling.

Suppose that there is a suspicious characteristic E such that 2 percent terrorists (T) have E but only 0.002 percent non-terrorists (−T) have E. This already gives us two probabilities: p(E/T) = 0.02; p(E/−T) = 0.00002. How useful is E for recognizing terrorists? How likely is it that someone is T if he has E? What is p(T/E)? Bayes’ theorem tells us that the answer depends on the percentage of terrorists in a population. (Clearly, if everybody is a terrorist, then p(T/E) = 1; if no one is a terrorist, then p(T/E) = 0; if some people are T and some −T, then 1 > p(T/E) > 0.) To activate the group question, suppose that there are two groups, A and B, that have different percentages of terrorists (1 in 100, and 1 in 10,000, respectively). This translates into different probabilities of an arbitrary member of a group being a terrorist. In group A, p(T) = 0.01 but in group B, p(T) = 0.0001. Now for the central question: what will p(T/E) be in A and in B? Figures 6.2a and 6.2b provide the answer.



In group A, the probability of a person with characteristic E being a terrorist is 0.91. In group B, this probability is 0.09 (more than ten times lower). The group membership matters, and it matters a lot.

Test your intuitions with a thought experiment: in an airport, you see a person belonging to group A and another person from group B. Both have suspicious trait E but they go in opposite directions. Whom will you follow and perhaps report to the police? Will you (a) go by probabilities and focus on A (committing the sin of racial or ethnic profiling), or (b) follow political correctness and flip a coin (and feel good about it)? It would be wrong to protest here and refuse to focus on A by pointing out that most As are not terrorists. This is true but irrelevant. Most As that have E are terrorists (91 percent of them, to be precise), and this is what counts. Compare that with the other group, where out of all Bs that have E, less than 10 percent are terrorists.

To recapitulate, since the two situations (the taxicab example and the social groups example) are similar in all relevant aspects, consistency requires the same answer. But the resolute answer is already given in the first situation. All competent people speak with one voice here, and agree that in this kind of situation the witness’s statement is only part of the relevant evidence. The proportion of blue cars must also be taken into account to get the correct probability that the taxi involved in the accident was blue. Therefore, there is no choice but to draw the corresponding conclusion in the second case. E is only part of the relevant evidence. The proportion of terrorists in group A (or B) must also be taken into account to get the correct probability that an individual from group A (or B) is a terrorist.

The “must” here is a conditional “must,” not a categorical imperative. That is, you must take into account prior probabilities if you want to know the true posterior probability. But sometimes there may be other considerations, besides the aim to know the true probability. For instance, it may be thought unfair or morally unacceptable to treat members of group A differently from members of group B. After all, As belong to their ethnic group without any decision on their part, and it could be argued that it is unjust to treat every A as more suspect just because a very small proportion of terrorists among As happens to be higher than an even lower proportion of terrorists among Bs. Why should some people be inconvenienced and treated worse than others only because they share a group characteristic, which they did not choose, which they cannot change, and which is in itself morally irrelevant?

I recognize the force of this question. It pulls in the opposite direction from Bayes’ theorem, urging us not to take into account prior probabilities. The question which of the two reasons (the Bayesian or the moral one) should prevail is very complex, and there is no doubt that the answer varies widely, depending on the specific circumstances and also on the answerer. I will not enter that debate at all because it would take us too far away from our subject.

The point to remember is that when many people say that “an individual can’t be judged by his group mean” (Gould 1977: 247), that “as individuals we are all unique and population statistics do not apply” (Venter 2000), that “a person should not be judged as a member of a group but as an individual” (Herrnstein & Murray 1994: 550), these statements sound nice and are likely to be well received but they conflict with the hard fact that a group membership sometimes does matter. If scholars wear their scientific hats when denying or disregarding this fact, I am afraid that rather than convincing the public they will more probably damage the credibility of science.

It is of course an empirical question how often and how much the group information is relevant for judgments about individuals in particular situations, but before we address this complicated issue in specific cases, we should first get rid of the wrong but popular idea that taking group membership into consideration (when thinking about individuals) is in itself irrational or morally condemnable, or both. On the contrary, in certain decisions about individuals, people “would have to be either saints or idiots not to be influenced by the collective statistics” (Genovese 1995: 333).


Lee Jussim (politically incorrect social psychologist; blog) in his interesting book Social perception and social reality (2012; download), notes the same fact. In fact, he spends an entire chapter on the question of how people integrate stereotypes with individualized information and whether this increases accuracy. He begins:

Stereotypes and Person Perception: How Should People Judge Individuals?
“Should” might mean many things. It might mean, “What would be the most moral thing to do?” Or, “What would be the legal thing to do, or the most socially acceptable thing to do, or the least off ensive thing to do?” I do not use it here, however, to mean any of these things. Instead, I use the term “should” here to mean “what would lead people to be most accurate?” It is possible that being as accurate as possible would be considered by some people to be immoral or even illegal (see Chapters 10 and 15). Indeed, a wonderful turn of phrase, “forbidden base-rates,” was coined (Tetlock, 2002 ) to capture the very idea that, sometimes, many people would be outraged by the use of general information about groups to reach judgments that would be as accurate as possible (a “base-rate” is the overall prevalence of some characteristic in a group, usually presented as a percentage; e.g., “0.7 % of Americans are in prison” is a base-rate reflecting Americans’ likelihood of being in prison). The focus in this chapter is exclusively on accuracy and not on morality or legality.

Philip Tetlock (famous for his forecasting tournaments) in the quoted article above, writes:

The SVPM [The sacred value-protection model] maintains that categorical proscriptions on cognition can also be triggered by blocking the implementation of relational schemata in sensitive domains. For example, forbidden base rates can be defined as any statistical generalization that devout Bayesians would not hesitate to insert into their likelihood computations but that deeply offends a moral community. In late 20th-century America, egalitarian movements struggled to purge racial discrimination and its residual effects from society (Sniderman & Tetlock, 1986). This goal was justified in communal-sharing terms (“we all belong to the same national family”) and in equality-matching terms (“let’s rectify an inequitable relationship”). Either way, individual or corporate actors who use statistical generalizations (about crime, academic achievement, etc.) to justify disadvantaging already disadvantaged populations are less likely to be lauded as savvy intuitive statisticians than they are to be condemned for their moral insensitivity.

So this is not some exotic idea, it is recognized by several experts.

I don’t have any particular opinion regarding the morality of using involuntary group members in one’s assessments, but in terms of epistemic rationality (making correct judgments), the case is clear: one must take into account group memberships when making judgments about individuals.

From researchgate: www.researchgate.net/post/What_is_the_actual_difference_between_1st_order_and_higher_order_logic

What is the actual difference between 1st order and higher order logic?
Yes, I know. They say, the 2nd order logic is more expressive, but it is really hard to me to see why. If we have a domain X, why can’t we define the domain X’ = X u 2^X and for elements of x in X’ define predicates:
BELONGS_TO(x, y) – undefined (or false) when ELEMENT(y)
Now, we can express sentences about subsets of X in the 1st-order logic!
Similarly we can define FUNCTION(x), etc. and… we can express all 2nd-order sentences in the 1st order logic!
I’m obviously overlooking something, but what actually? Where have I made a mistake?

My answer:

In many cases one can reduce a higher order formalization to a first-order, but it will come at the price of complexity of the formalization.

For instance, formalize the follow argument in both first order and second order logic:
All things with personal properties are persons. Being kind is a personal property. Peter is kind. Therefore, Peter is a person.

One can do this with either first or second order, but it is easier in second-order.

First-order formalization:
1. (∀x)(PersonalProperty(x)→((∀y)(HasProperty(y,x)→Person(y)))
2. PersonalProperty(kind)
3. HasProperty(peter,kind)
⊢ 4. Person(peter)

Second-order formalization
1. (∀Φ)(PersonalProperty(Φ)→(∀x)(Φx→Person(x)))
2. PersonalProperty(IsKind)
3. IsKind(peter)
⊢ 4. Person(peter)

where Φ is a second-order variable. Basically, whenever one uses first order to formalize arguments like this, one has to use a predicate like “HasProperty(x,y)” so that one can treat variables as properties indirectly. This is unnecessary in second-order logics.

From here. btw this thread was one of the many discussions that helped form my views about what wud later become the essay about begging the question, and the essay about how to define “deductive argument” and “inductive argument”.


 Do you know much about Jung’s theory of archetypes? If so, what do you make of it?


 I don’t make much of Jung. Except for the notions of introversion and extroversion. Not my cup of tea. As I said, we don’t create our own beliefs. We acquire them. Beliefs are not voluntary.


 They are to some extend but not as much as some people think (Pascal’s argument comes to mind).


 Yes, it does. And that is an issue. His argument does not show anything about this issue. He just assumes that belief is voluntary He does talk about how someone might acquire beliefs. He advises, for instance, that people start going to Mass, and practicing Catholic ritual. And says they will acquire Catholic beliefs that way. It sounds implausible to me. It is a little like the old joke about a well-known skeptic, who puts a horseshoe on his door for good luck. A friend of his sees the horseshoe and says, “But I thought you did not believe in that kind of thing”. To which the skeptic replied, “I don’t, but I hear that it works even if you don’t believe it”.


In my hard task to avoid actually doing my linguistics exam paper, ive been reading a lot of other stuff to keep my thoughts away from thinking about how i really ought to start writing my paper. In this case i am currently reading a book, Human Reasoning and Cognitive Science (Keith Stenning and Michiel van Lambalgen), and its pretty interesting. But in the book they authors mentioned another paper, and i like to loop up references in books. Its that paper that this post is about.

Logic and Reasoning do the facts matter free pdf download

Why is it interesting? first: its a mixture of som of my favorit fields, fields that can be difficult to synthesize. im talking about filosofy of logic, logic, linguistics, and psychology. they are all related to the fenomenon of human reasoning. heres the abstract:

Modern logic is undergoing a cognitive turn, side-stepping Frege’s ‘anti- psychologism’. Collaborations between logicians and colleagues in more empirical fields are growing, especially in research on reasoning and information update by intelligent agents. We place this border-crossing research in the context of long-standing contacts between logic and empirical facts, since pure normativity has never been a plausible stance. We also discuss what the fall of Frege’s Wall means for a new agenda of logic as a theory of rational agency, and what might then be a viable understanding of ‘psychologism’ as a friend rather than an enemy of logical theory.

its not super long at 15 pages, and definitly worth reading for anyone with an interest in the b4mentioned fields. in this post id like to model som of the scenarios mentioned in the paper.

To me, however, the most striking recent move toward greater realism is the wide range of information-transforming processes studied in modern logic, far beyond inference. As we know from practice, inference occurs intertwined with many other notions. In a recent ‘Kids’ Science Lecture’ on logic for children aged around 8, I gave the following variant of an example from Antiquity, to explain what modern logic is about:

You are in a restaurant with your parents, and you have ordered three dishes: Fish, Meat, and Vegetarian. Now a new waiter comes back from the kitchen with three dishes. What will happen?

The children say, quite correctly, that the waiter will ask a question,say: “Who has the Fish?”. Then, they say that he will ask “Who has the Meat?” Then, as you wait, the light starts shining in those little eyes, and a girl shouts: “Sir, now, he will not ask any more!” Indeed, two questions plus one inference are all that is needed. Now a classical logician would have nothing to say about the questions (they just ‘provide premises’), but go straight for the inference. In my view, this separation is unnatural, and logic owes us an account of both informational processes that work in tandem: the information flow in questions and answers, and the inferences that can be drawn at any stage. And that is just what modern so-called ‘dynamic- epistemic logics’ do! (See [32] and [30].) But actually, much more is involved in natural communication and argumentation. In order to get premises to get an inference going, we ask questions. To understand answers, we need to interpret what was said, and then incorporate that information. Thus, the logical system acquires a new task, in addition to providing valid inferences, viz. systematically keeping track of changing representations of information. And when we get information that contradicts our beliefs so far, we must revise those beliefs in some coherent fashion. And again, modern logic has a lot to say about all of this in the model theory of updates and belief changes.

i think it shud be possible to model this situation with help my from erotetic logic.

first off, somthing not explicitly mentioned but clearly true is that the goal for the waiter to find out who shud hav which dish. So, the waiter is asking himself these three questions:

Q1: ∃x(ordered(x,fish)∧x=?) – somone has ordered fish, and who is that?
Q2: ∃y(ordered(y,meat)∧y=?) – somone has ordered meat, and who is that?
Q3: ∃z(ordered(z,veg)∧z=?) – somone has ordered veg, and who is that?
(x, y, z ar in the domain of persons)

the waiter can make another, defeasible, assumption (premis), which is that x≠y≠z, that is, no person ordered two dishes.

also not stated explicitly is the fact that ther ar only 3 persons, the child who is asked to imagin the situation, and his 2 parents. these correspond to x, y, z, but the relations between them dont matter for this situation. and we dont know which is which, so we’ll introduce 3 particulars to refer to the three persons: a, b, c. lets say the a is the father, b the mother, c the child. also, a≠b≠c.

the waiter needs to find 3 correct answers to 3 questions. the order doesnt seem to matter – it might in practice, for practical reasons, like if the dishes ar partly on top of each other, in which case the topmost one needs to be served first. but since it doesnt in this situation, som arbitrary order of questions is used, in this case the order the fishes wer previusly mentioned in: fish, meat, veg. befor the waiter gets the correct answer to Q1, he can deduce that:

(follows from varius previusly mentioned premisses and with classical FOL with identity)

then, say that the answer gets the answer “me” from a (the father), then given that a, b, and c ar telling the truth, and given som facts about how indexicals work, he can deduce that a=x. so the waiter has acquired the first piece of information needed. befor proceeding to asking mor questions, the waiter then updates his beliefs by deduction. he can now conclude that:

(follows from varius previusly mentioned premisses and with classical FOL with identity)

since the waiter cant seem to infer his way to what he needs to know, which is the correct answers to Q2 and Q3, he then proceeds to ask another question. when he gets the answer, say that b (the mother) says “me”, he concludes like befor that z=b, and then hands the mother the veg dish.

then like befor, befor proceeding with mor questions, he tries to infer his way to the correct answer to Q3, and this time it is possible, hence he concludes that:

(follows from varius previusly mentioned premisses and with classical FOL with identity)

and then he needs not ask Q3 at all, but can just hand c (the child) the dish with meat.

Moreover, in doing so, it must account for another typical cognitive phenomenon in actual behavior, the interactive multi-agent character of the basic logical tasks. Again, the children at the Kids’ Lecture had no difficulty when we played the following scenario:

Three volunteers were called to the front, and received one coloured card each: red, white, blue. They could not see the others’ cards. When asked, all said they did not know the cards of the others. Then one girl (with the white card) was allowed a question; and asked the boy with the blue card if he had the red one. I then asked, before the answer was given, if they now knew the others’ cards, and the boy with the blue card raised his hand, to show he did. After he had answered “No” to his card question, I asked again who knew the cards, and now that same boy and the girl both raised their hands …

The explanation is a simple exercise in updating, assuming that the question reflected a genuine uncertainty. But it does involve reasoning about what others do and do not know. And the children did understand why one of them, the girl with the red card, still could not figure out everyone’s cards, even though she knew that they now knew.15

this one is mor tricky, this it involves beliefs of different ppl, the first situation didnt.

the questions ar:

Q1: ∃x(possess(x,red)∧x=?)
Q2: ∃y(possess(y,white)∧y=?)
Q3: ∃z(possess(z,blue)∧z=?)

again, som implicit facts:


and non-identicalness of the persons:

x≠y≠z, and a≠b≠c. a is the first girl, b is the boy, c is the second girl. ther ar no other persons. this allow the inference of the facts:


another implicit fact, namely that the children can see their own card and know which color it is:

∀x∀card(possess(x, card)→know(x, possess(x, card)) – for any person and for any colored card, if that person possesses the card, then that person knows that that person possesses the card.

the facts given in the description of who actually has which cards are:


so, given these facts, each person can now deduce which variable is identical to one of the constants, and so:


but non of the persons can seem to answer the other two questions, altho it is different questions they cant answer. for this reason, one person, a (first girl), is allowed to ask a question. she asks:

Q3: possess(b,red)? [towards b]

now, befor the answer is given, the researcher asks if anyone knows the answer to all the questions. b raises his hand. did he know? possibly. we need to add another assumption to see why. b (the boy) is assuming that a (the first girl) is asking a nondeceptiv question. she is trying to get som information out from b (the boy). this is not so if she asks about somthing she already knows. she might do that to deceive, but assuming that isnt the case, we can add:


in EN: for any two persons, and any card, if the first person is asking the second person about whether the second person possesses the card, then the first person does not possess the card. from this assumption of non-deception, the boy can infer:

¬possess(a, red)

and so he coms to know that:

know(b,¬possess(a, red))∧know(b, x≠a)

can the boy figure out the questions now? yes: becus he also knows:


from which he can infer that:

¬possess(b,red) – she asked about it, so she doesnt hav it herself
¬possess(b, blue) – he has the card himself, and only 1 person has the card

but recall that every person has a card, and he knows that b has neither the red or the blue, then he can infer that b has the white card. and then, since ther ar only 3 cards and 2 persons, and he knows the answers to the first two questions, ther is only one option left for the last person: she must hav the red card. hence, he can raise his hand.

the girl who asked the question, however, lacks the crucial information of which card the boy has befor he answers the question, so she cant infer anything mor, and hence doesnt raise her hand.

now, b (the boy) answers in the negativ. assuming non-deceptivness again (maxim of truth) but in another form, she can infer that:

¬possess(b, red)

and so also knows that:

¬possess(a, red)

hence, she can deduce that, the last person must hav the red card, hence:


from that, she can infer that the boy, b, has the last remaining card, the blue one. hence she has all the answers to Q1-3, and can raise her hand.

the second girl, however, still lacks crucial information to deduce what the others hav. the information made public so far doenst help her at all, since she already knew all along that she had the red card. no other information has been made available to her, so she cant tell whether a or b has the blue card, or the white card. hence, she doenst raise her hand.

all of this assumes that the children ar rather bright and do not fail to make relevant logical conclusions. probably, these 2 examples ar rather made up. but see also:

a similar but much harder problem. surely it is possible to make a computer that can figur this one out, i already formalized it befor. i didnt try to make an algorithm for it, but surely its possible. heres my formalizations.


types of reasoners, i assumed that they infered nothing wrong, and infered everything relevant. wikipedia has a list of common assumptions like this: en.wikipedia.org/wiki/Doxastic_logic#Types_of_reasoners

(KK)If one knows that p, then one knows that one knows that p.

A0is the proposition that 1+1=2.
A1is the proposition that Emil knows that 1+1=2.
A2is the proposition that Emil knows that Emil knows that 1+1=2.

Anis the proposition that Emil knows that Emil knows that … that 1+1=2.
Where “…” is filled by “that Emil knows” repeated the number of times in the subscript of A.

1. Assumption for RAA
For any proposition, P, and any person, x, if x knows that P, then x knows that x knows that P.

2. Premise
Emil knows that A0.

3. Premise
There is a set, S1, such that A0belongs to S1, and A1belongs to S1, and … and Anbelongs to S1, and the cardinality of S1is infinite, and S1is identicla to SA.

4. Inference from (1), (2), and (3)
For any proposition, P, if P belongs to SA, then Emil knows that P.

5. Premise
It is not the case that, for any proposition, P, if P belongs to SA, then Emil knows that P.

6. Inference from (1-5), RAA
It is not the case that, for any proposition, P, and any person, x, if x knows that P, then x knows that x knows that P.

Proving it
Proving that it is valid formally is sort of difficult as it requires a system with set theory, predicate logic with quantification over propositions. The above sketch should be enough for whoever doubts the formal validity.

This is another of those ideas that ive had independently, and that it turned out that others had thought of before me, by thousands of years in this case. The idea is that longer expressions of language as made out of smaller parts of language, and that the meaning of the whole is determined by the parts and their structure. This is rather close to the formulation used on SEP. Heres the introduction on SEP:


Anything that deserves to be called a language must contain meaningful expressions built up from other meaningful expressions. How are their complexity and meaning related? The traditional view is that the relationship is fairly tight: the meaning of a complex expression is fully determined by its structure and the meanings of its constituents—once we fix what the parts mean and how they are put together we have no more leeway regarding the meaning of the whole. This is the principle of compositionality, a fundamental presupposition of most contemporary work in semantics.

Proponents of compositionality typically emphasize the productivity and systematicity of our linguistic understanding. We can understand a large—perhaps infinitely large—collection of complex expressions the first time we encounter them, and if we understand some complex expressions we tend to understand others that can be obtained by recombining their constituents. Compositionality is supposed to feature in the best explanation of these phenomena. Opponents of compositionality typically point to cases when meanings of larger expressions seem to depend on the intentions of the speaker, on the linguistic environment, or on the setting in which the utterance takes place without their parts displaying a similar dependence. They try to respond to the arguments from productivity and systematicity by insisting that the phenomena are limited, and by suggesting alternative explanations.


SEP goes on to discuss some more formal versions of the general idea:


(C) The meaning of a complex expression is determined by its structure and the meanings of its constituents.



(C′) For every complex expression e in L, the meaning of e in L is determined by the structure of e in L and the meanings of the constituents of e in L.


SEP goes on to disguish between a lot of different versions of this. See the article for details.

The thing i wanted to discuss was the counterexamples offered. I found none of them to be rather compelling. Based mostly on intuition pumps as far as i can tell, and im rather wary of such (cf. Every Thing Must Go, amazon).


Heres SEP’s first example, using chess notation (many other game notations wud also work, e.g. Taifho):


Consider the Algebraic notation for chess.[15] Here are the basics. The rows of the chessboard are represented by the numerals 1, 2, … , 8; the columns are represented by the lower case letters a, b, … , h. The squares are identified by column and row; for example b5 is at the intersection of the second column and the fifth row. Upper case letters represent the pieces: K stands for king, Q for queen, R for rook, B for bishop, and N for knight. Moves are typically represented by a triplet consisting of an upper case letter standing for the piece that makes the move and a sign standing for the square where the piece moves. There are five exceptions to this: (i) moves made by pawns lack the upper case letter from the beginning, (ii) when more than one piece of the same type could reach the same square, the sign for the square of departure is placed immediately in front of the sign for the square of arrival, (iii) when a move results in a capture an x is placed immediately in front of the sign for the square of arrival, (iv) the symbol 0-0 represents castling on the king’s side, (v) the symbol 0-0-0 represents castling on the queen’s side. + stands for check, and ++ for mate. The rest of the notation serves to make commentaries about the moves and is inessential for understanding it.

Someone who understands the Algebraic notation must be able to follow descriptions of particular chess games in it and someone who can do that must be able to tell which move is represented by particular lines within such a description. Nonetheless, it is clear that when someone sees the line Bb5 in the middle of such a description, knowing what B, b, and 5 mean will not be enough to figure out what this move is supposed to be. It must be a move to b5 made by a bishop, but we don’t know which bishop (not even whether it is white or black) and we don’t know which square it is coming from. All this can be determined by following the description of the game from the beginning, assuming that one knows what the initial configurations of figures are on the chessboard, that white moves first, and that afterwards black and white move one after the other. But staring at Bb5 itself will not help.


It is exacly the bold lines i dont accept. Why must one be able to know that from the meaning alone? Knowing the meaning of expressions does not always make it easy to know what a given noun (or NP) refers to. In this case “B” is a noun refering to a bishop, which one? Well, who knows. There are lots of examples of words refering to differnet things (people usually) when used in diffferent contexts. For instance, the word “me” refers to the source of the expression, but when an expression is used by different speakers, then “me” refers to different people, cf. indexicals (SEP and Wiki).


Ofc, my thoughts about are not particularly unique, and SEP mentions the defense that i also thought of:


The second moral is that—given certain assumptions about meaning in chess notation—we can have productive and systematic understanding of representations even if the system itself is not compositional. The assumptions in question are that (i) the description I gave in the first paragraph of this section fully determines what the simple expressions of chess notation mean and also how they can be combined to form complex expressions, and that (ii) the meaning of a line within a chess notation determines a move. One can reject (i) and argue, for example, that the meaning of B in Bb5 contains an indexical component and within the context of a description, it picks out a particular bishop moving from a particular square. One can also reject (ii) and argue, for example, that the meaning of Bb5 is nothing more than the meaning of ‘some bishop moves from somewhere to square b5’—utterances of Bb5 might carry extra information but that is of no concern for the semantics of the notation. Both moves would save compositionality at a price. The first complicates considerably what we have to say about lexical meanings; the second widens the gap between meanings of expressions and meanings of their utterances. Whether saving compositionality is worth either of these costs (or whether there is some other story to be told about our understanding of the Algebraic notation) is by no means clear. For all we know, Algebraic notation might be non-compositional.


I also dont agree that it widens the gap between meanings of expressions and meanings of utterances. It has to do with refering to stuff, not meaning in itself.

4.2.1 Conditionals

Consider the following minimal pair:

(1) Everyone will succeed if he works hard.
(2) No one will succeed if he goofs off.

A good translation of (1) into a first-order language is (1′). But the analogous translation of (2) would yield (2′), which is inadequate. A good translation for (2) would be (2″) but it is unclear why. We might convert ‘¬∃’ to the equivalent ‘∀¬’ but then we must also inexplicably push the negation into the consequent of the embedded conditional.

(1′) ∀x(x works hard → x will succeed)
(2′) ¬∃
x (x goofs off → x will succeed)
(2″) ∀
x (x goofs off → ¬(x will succeed))

This gives rise to a problem for the compositionality of English, since is seems rather plausible that the syntactic structure of (1) and (2) is the same and that ‘if’ contributes some sort of conditional connective—not necessarily a material conditional!—to the meaning of (1). But it seems that it cannot contribute just that to the meaning of (2). More precisely, the interpretation of an embedded conditional clause appears to be sensitive to the nature of the quantifier in the embedding sentence—a violation of compositionality.[16]

One response might be to claim that ‘if’ does not contribute a conditional connective to the meaning of either (1) or (2)—rather, it marks a restriction on the domain of the quantifier, as the paraphrases under (1″) and (2″) suggest:[17]

(1″) Everyone who works hard will succeed.
(2″) No one who goofs off will succeed.

But this simple proposal (however it may be implemented) runs into trouble when it comes to quantifiers like ‘most’. Unlike (3′), (3) says that those students (in the contextually given domain) who succeed if they work hard are most of the students (in the contextually relevant domain):

(3) Most students will succeed if they work hard.
(3′) Most students who work hard will succeed.

The debate whether a good semantic analysis of if-clauses under quantifiers can obey compositionality is lively and open.[18]


Doesnt seem particularly difficult to me. When i look at an “if-then” clause, the first thing i do before formalizing is turning it around so that “if” is first, and i also insert any missing “then”. With their example:


(1) Everyone will succeed if he works hard.
(2) No one will succeed if he goofs off.


this results in:


(1)* If he works hard, then everyone will succeed.
(2)* If he goofs off, then no one will succeed.


Both “everyone” and “no one” express a universal quantifer, ∀. The second one has a negation as well. We can translate this to something like “all”, and “no” to “not”. Then we might get:


(1)** If he works hard, then all will succeed.
(2)** If he goofs off, then all will not succeed.


Then, we move the quantifier to the beginning and insert a pronoun, “he”, to match. Then we get something like:


(1)*** For any person, if he works hard, then he will succeed.
(2)*** For any person, if he goofs off, then he will not succeed.


These are equivalent with SEP’s


(1″) Everyone who works hard will succeed.
(2″) No one who goofs off will succeed.


The difference between (3) and (3′) is interesting, not becus of relevance to my method about (i think), but since it deals with something beyond first-order logic. Quantification logic, i suppose? I did a brief Google and Wiki search, but didnt find something like that i was looking for. I also tried Graham Priest’s Introduction to non-classical logic, also without luck.


So here goes some system i just invented to formalize the sentences:


(3) Most students will succeed if they work hard.
(3′) Most students who work hard will succeed.


Capital greek letters are set variables. # is a function that returns the cardinality a set.


(3)* (∃Γ)(∃Δ)(∀x)(∀y)(Sx↔x∈Γ∧Δ⊆Γ∧#Δ>(#Γ/2)∧(y∈Δ)→(Wy→Uy))


In english: There is a set, gamma, and there is another set, delta, and for any x, and for any y, x is a student iff x is in gamma, and delta is a subset of gamma, and the cardinality of delta is larger than half the cardinality of gamma, and if y is in delta, then (if y works hard, then y will succeed).


Quite complicated in writing, but the idea is not that complicated. It shud be possible to find some simplified writing convention for easier expression of this way of formalizing it.


(3′)* (∃Γ)(∃Δ)(∀x)(∀y)(((Sx∧Wx)↔x∈Γ)∧Δ⊆Γ∧#Δ>(#Γ/2)∧(y∈Δ→Uy))


In english: there is a set, gamma, and there is another set, delta, and for any x, and for any y, (x is a student and x works hard) iff x is in gamma, and delta is a subset of gamma, and the cardinality of delta is larger than half the cardinality of gamma, and if y is in delta, then u will succeed.


To my logician intuition, these are not equivalent, but proving this is left as an exercise to the reader if he can figure out a way to do so in this set theory+predicate logic system (i might try later).


4.2.2 Cross-sentential anaphora

Consider the following minimal pair from Barbara Partee:


(4) I dropped ten marbles and found all but one of them. It is probably under the sofa.

(5) I dropped ten marbles and found nine of them. It is probably under the sofa.


There is a clear difference between (4) and (5)—the first one is unproblematic, the second markedly odd. This difference is plausibly a matter of meaning, and so (4) and (5) cannot be synonyms. Nonetheless, the first sentences are at least truth-conditionally equivalent. If we adopt a conception of meaning where truth-conditional equivalence is sufficient for synonymy, we have an apparent counterexample to compositionality.


I dont accept that premise either. I havent done so since i read Swartz and Bradley years ago. Sentences like


“Canada is north of Mexico”

“Mexico is south of Canada”


are logically equivalent, but are not synonymous. The concept of being north of, and the concept of being south of are not the same, even tho they stand in a kind reverse relation. That is to say, xR1y↔yR2x. Not sure what to call such relations. It’s symmetry+substitition of relations.


Sentences like


“Everything that is round, has a shape.”

“Nothing is not identical to itself.”


are logically equivalent but dont mean the same. And so on, cf. Swartz and Bradley 1979, and SEP on theories of meaning.


Interesting though these cases might be, it is not at all clear that we are faced with a genuine challenge to compositionality, even if we want to stick with the idea that meanings are just truth-conditions. For it is not clear that (5) lacks the normal reading of (4)—on reflection it seems better to say that the reading is available even though it is considerably harder to get. (Contrast this with an example due to—I think—Irene Heim: ‘They got married. She is beautiful.’ This is like (5) because the first sentence lacks an explicit antecedent for the pronoun in the second. Nonetheless, it is clear that the bride is said to be beautiful.) If the difference between (4) and (5) is only this, it is no longer clear that we must accept the idea that they must differ in meaning.


I agree that (4) and (5) mean the same, even if (5) is a rather bad way to express the thing one normally wud express with something like (4).


In their bride example, one can also consider homosexual weddings, where “he” and “she” similarly fails to refer to a specific person out of the two newlywed.

4.2.3 Adjectives

Suppose a Japanese maple leaf, turned brown, has been painted green. Consider someone pointing at this leaf uttering (6):


(6) This leaf is green.


The utterance could be true on one occasion (say, when the speaker is sorting leaves for decoration) and false on another (say, when the speaker is trying to identify the species of tree the leaf belongs to). The meanings of the words are the same on both occasions and so is their syntactic composition. But the meaning of (6) on these two occasions—what (6) says when uttered in these occasions—is different. As Charles Travis, the inventor of this example puts it: “…words may have all the stipulated features while saying something true, but also while saying something false.”[[20]


At least three responses offer themselves. One is to deny the relevant intuition. Perhaps the leaf really is green if it is painted green and (6) is uttered truly in both situations. Nonetheless, we might be sometimes reluctant to make such a true utterance for fear of being misleading. We might be taken to falsely suggest that the leaf is green under the paint or that it is not painted at all.[21] The second option is to point out that the fact that a sentence can say one thing on one occasion and something else on another is not in conflict with its meaning remaining the same. Do we have then a challenge to compositionality of reference, or perhaps to compositionality of content? Not clear, for the reference or content of ‘green’ may also change between the two situations. This could happen, for example, if the lexical representation of this word contains an indexical element.[22] If this seems ad hoc, we can say instead that although (6) can be used to make both true and false assertions, the truth-value of the sentence itself is determined compositionally.[23]


Im going to bite the bullet again, and just say that the sentence means the same on both occasions. What is different is that in different contexts, one might interpret the same sentence to express different propositions. This is not something new as it was already featured before as well, altho this time it is without indexicals. The reason is that altho the sentence means the same, one is guessing at which proposition the utterer meant to express with his sentence. Context helps with that.

4.2.4 Propositional attitudes

Perhaps the most widely known objection to compositionality comes from the observation that even if e and e′ are synonyms, the truth-values of sentences where they occur embedded within the clausal complement of a mental attitude verb may well differ. So, despite the fact that ‘eye-doctor’ and ‘ophthalmologist’ are synonyms (7) may be true and (8) false if Carla is ignorant of this fact:


(7) Carla believes that eye doctors are rich.
(8) Carla believes that ophthalmologists are rich.


So, we have a case of apparent violation of compositionality; cf. Pelletier (1994).

There is a sizable literature on the semantics of propositional attitude reports. Some think that considerations like this show that there are no genuine synonyms in natural languages. If so, compositionality (at least the language-bound version) is of course vacuously true. Some deny the intuition that (7) and (8) may differ in truth-conditions and seek explanations for the contrary appearance in terms of implicature.[24] Some give up the letter of compositionality but still provide recursive semantic clauses.[25] And some preserve compositionality by postulating a hidden indexical associated with ‘believe’.[26]


Im not entirely sure what to do about these propositional attitude reports, but im inclined to bite the bullet. Perhaps i will change my mind after i have read the two SEP articles about the matter.


Idiomatic language

The SEP article really didnt have a proper discussion of idiomatic language use. Say, frases like “dont mention it” which can either mean what it literally (i.e., by composition) means, or its idiomatic meaning: This is used as a response to being thanked, suggesting that the help given was no trouble (same source).

Depending on what one takes “complex expression” to mean. Recall the principle:


(C′) For every complex expression e in L, the meaning of e in L is determined by the structure of e in L and the meanings of the constituents of e in L.


What is a complex expression? Is any given complex expression made up of either complex expressions themselves or simple expressions? Idiomatic expressions really just are expressions whose meaning is not determined by their parts. One might thus actually take them to be simple expressions themselves. If one does, then the composition principle is pretty close to trivially true.


If one does not take idiomatic expressions to be complex expressions or simple expressions, then the principle of composition is trivially false. I dont consider that a huge problem, it generally holds, and explains the things it is required to explain just fine when it isnt universally true.


One can also note that idiomatic expressions can be used as parts of larger expressions. Depending on which way to think about idiomatic expressions, and of constituents, then larger expressions which have idiomatic expressions as parts of them might be trivially non-compositional. This is the case if one takes constituents to mean smallest parts. If one does, then since the idiomatic expressions’ meanings cannot be determined from syntax+smallest parts, then neither can the larger expression. If one on the other hand takes constituents to mean smallest decompositional parts, then idiomatic expressions do not trivially make the larger expressions they are part of non-compositional. Consider the sentence:


“He is pulling your leg”


the sentence is compositional since its meaning is determinable from “he”, “is”, “pulling your leg”, the syntax, and the meaning function.


There is a reason i bring up this detail, and that is that there is another kind of idiomatic use of language that apparently hasnt been mentioned so much in the literature, judging from SEP not mentioning it. It is the use of prepositions. Surely, many prepositions are used in perfectly compositional ways with other words, like in


“the cat is on the mat”


where “on” has the usual meaning of being on top of (something), or being above and resting upon or somesuch (difficult to avoid circular definitions of prepositions).


However, consider the use of “on” in


“he spent all his time on the internet”


clearly “on” does not mean the same as above here, it doesnt seem to mean much, it is a kind of indefinite relationship. Apparently aware of this fact (and becus languages differ in which prepositions are used in such cases), the designer of esperanto added a preposition for any indefinite relation to the language (“je”). Some languages have lots of such idiomatic preposition+noun frases, and they have to be learned by heart exactly the same way as the idiomatic expressions mentioned earlier, exactly becus they are idiomatic expressions.


As an illustration, in danish if one is at an island, one is “på Fyn”, but if one is at the mainland, then one is “i Jylland”. I think such usage of prepositions shud be considered idiomatic.