Clear Language, Clear Mind

January 13, 2016

Linguistics paper (for class): Typology and statistics

Filed under: Linguistics/language,Math/Statistics — Tags: , — Emil O. W. Kirkegaard @ 09:12

This may have some interest. Basically, typologists cannot into statistics and it shows. On the other hand, it means there is a large number of low hanging fruit for someone with skills in statistical programming.


This was handed in as a paper for typology class. Quite likely the last class I will take in linguistics. I don’t plan on actually getting the master’s degree.

July 25, 2015

Approximate string matching in R

Filed under: Computer science,Linguistics/language — Tags: , , , , , — Emil O. W. Kirkegaard @ 07:21

In trying to merge some data, I was confronted with a problem of matching up strings where the author had mutilated them. He had done so in two ways: cutting them off at the 8th character or using personal abbreviations. The first one is relatively easy to deal with. The second one is not.

So I looked around a bit and found that there are others who had similar problems:


The large table below shows the matching results. The leftmost column has the strings I need to match. The list to be matched against is the list of country and regional names and their ISO 3 abbreviations here:

It looks like for most cases the shortened version worked fine. 165 of 193 matches were found all of which were correct.

The agrep (with max distance = .1, the default), found a match in 175 cases, so only a little improvement there. But it gets worse, in many cases, it disagrees with the stricter matching method and gets it wrong. There is no case where it is correct over the simpler method. Strangely, in all 10 cases where the simple method failed, agrep got it right. But it got it wrong in the easier cases. In some cases, it is truly bizarre: given the string “United S”, it goes for “United Republic of Tanzania” instead of the much easier “United States”. Strangely, a common error is preferring a subset/longer version over an exact match. No human would make this error. E.g. given “Moldova”, it prefers “Moldova, Republic” of over just “Moldova”.

There are a number of different errors it makes. In the comments below I have noted the type of error (my judgment).

For the moment, I would caution the use of this algorithm.

Country Genetic_distance_SA to_short_result agrep_result best_match agreement filled_in comments
Norway 1455.52 Norway Norway Norway TRUE Norway
Netherla 1453.28 Netherlands Netherlands Netherlands TRUE Netherlands
Ireland 1940.31 Ireland Iceland Ireland FALSE Ireland prefers substitution over exact
Liechste 1511.83 Liechtenstein Liechtenstein FALSE Liechtenstein correct
Germany 1484.92 Germany Germany Germany TRUE Germany
Sweeden 1453.79 Sweden Sweden FALSE Sweden correct
Switzerl 1557.96 Switzerland Switzerland Switzerland TRUE Switzerland
Iceland 1932.36 Iceland Iceland Iceland TRUE Iceland
Denmark 1472.52 Denmark Denmark Denmark TRUE Denmark
Belgium 1940.31 Belgium Belgium Belgium TRUE Belgium
Austria 1465.7 Austria Australia Austria FALSE Austria prefers part deleted
France 1896.22 France France France TRUE France
Slovenia 1292.32 Slovenia Slovenia Slovenia TRUE Slovenia
Finland 2420.3 Finland Finland Finland TRUE Finland
Spain 1929.44 Spain Saint Barth<U+FFFD>lemy Spain FALSE Spain no idea
Italy 1961.9 Italy Italy Italy TRUE Italy
Luxembur 1929.98 TRUE Luxembourg
Czech Re 1524.73 Czech Republic Czech Republic Czech Republic TRUE Czech Republic
U. K. 1916.91 TRUE UK
Greece 1283.05 Greece Greece Greece TRUE Greece
Cyprus 1288.53 Cyprus Cyprus Cyprus TRUE Cyprus
Estonia 2302.86 Estonia Estonia Estonia TRUE Estonia
Slovakia 1573.38 Slovakia Slovakia Slovakia TRUE Slovakia
Malta 1912.52 Malta Gibraltar Malta FALSE Malta prefers subset + substitution over exact
Poland 1905.67 Poland Poland Poland TRUE Poland
Lithuani 2389.28 Lithuania Lithuania Lithuania TRUE Lithuania
Portugal 1949.34 Portugal Portugal Portugal TRUE Portugal
Latvia 2256.69 Latvia Latvia Latvia TRUE Latvia
Croatia 1289.64 Croatia Croatia Croatia TRUE Croatia
Romania 1928.4 Romania Romania Romania TRUE Romania
Bulgaria 1399.01 Bulgaria Bulgaria Bulgaria TRUE Bulgaria
Serbia 1421.01 Serbia Serbia Serbia TRUE Serbia
Russia 1975.49 Russia Russia Russia TRUE Russia
Albania 1301.47 Albania Albania Albania TRUE Albania
Macedoni 1334.51 Macedonia Macedonia Macedonia TRUE Macedonia
Armenia 1558.32 Armenia Armenia Armenia TRUE Armenia
Moldova 1527.95 Moldova Moldova, Republic of Moldova FALSE Moldova prefers longer
Botswana 347.18 Botswana Botswana Botswana TRUE Botswana
South Af 0 South Africa South Africa South Africa TRUE South Africa
Ghana 395.9 Ghana Ghana Ghana TRUE Ghana
Eq Guine 373.2 TRUE Equatorial Guinea
Congo 452.9 Congo Congo Congo TRUE Congo
Kenya 366.78 Kenya Kenya Kenya TRUE Kenya
Cameroon 319.62 Cameroon Cameroon Cameroon TRUE Cameroon
Tanzania 352.54 Tanzania Tanzania Tanzania TRUE Tanzania
Nigeria 342.24 Nigeria Nigeria Nigeria TRUE Nigeria
Uganda 358.75 Uganda Uganda Uganda TRUE Uganda
Zambia 352.54 Zambia Gambia Zambia FALSE Zambia prefers substitution over exact
Sudan 316.95 Sudan Sudan Sudan TRUE Sudan
Zimbabwe 352.54 Zimbabwe Zimbabwe Zimbabwe TRUE Zimbabwe
Ethiopia 705.3 Ethiopia Ethiopia Ethiopia TRUE Ethiopia
Guinea 395.9 Guinea Guinea Guinea TRUE Guinea
CentAfrR 469.7 TRUE Central African Republic
SierraLe 395.9 TRUE Sierra Leone
Mozambiq 355.75 Mozambique Mozambique Mozambique TRUE Mozambique
CongoDR 410.17 Congo Republic of Congo Republic of FALSE Congo Republic of wrong but excuseable; Congo Democratic Republic
Andorra 1912.83 Andorra Andorra Andorra TRUE Andorra
Angola 353.49 Angola Angola Angola TRUE Angola
Belarus 1949.85 Belarus Belarus Belarus TRUE Belarus
Benin 394.54 Benin Benin Benin TRUE Benin
Bosnia 1337.37 Bosnia Bosnia and Herzegovina Bosnia FALSE Bosnia prefers longer
BurkinaF 378.36 Burkina Faso Burkina Faso FALSE Burkina Faso correct
Burundi 362.98 Burundi Burundi Burundi TRUE Burundi
Cape Ver 963.09 Cape Verde Cape Verde Cape Verde TRUE Cape Verde
Chad 537.81 Chad Chad Chad TRUE Chad
Comoros 352.54 Comoros Comoros Comoros TRUE Comoros
IvoryCoa 468.01 TRUE Ivory Coast
Djibouti 750.88 Djibouti Djibouti Djibouti TRUE Djibouti
Eritrea 665.96 Eritrea Eritrea Eritrea TRUE Eritrea
Gabon 360.07 Gabon Gabon Gabon TRUE Gabon
Gambia 395.9 Gambia Gambia Gambia TRUE Gambia
Georgia 1613.7 Georgia Georgia Georgia TRUE Georgia
Guinea-B 395.9 Guinea-Bissau Guinea-Bissau Guinea-Bissau TRUE Guinea-Bissau
Lesotho 352.54 Lesotho Lesotho Lesotho TRUE Lesotho
Liberia 395.9 Liberia Liberia Liberia TRUE Liberia
Malawi 352.54 Malawi Malawi Malawi TRUE Malawi
Mali 430.58 Mali Australia Mali FALSE Mali prefers subset + substitution over exact
Mauritan 681.23 Mauritania Mauritania Mauritania TRUE Mauritania
Namibia 419.32 Namibia Namibia Namibia TRUE Namibia
Niger 315.05 Niger Niger Niger TRUE Niger
Rwanda 364.26 Rwanda Rwanda Rwanda TRUE Rwanda
SaoTomeP 339.89 TRUE Sao Tome and Principe
Senegal 395.9 Senegal Senegal Senegal TRUE Senegal
Seychell 1709.06 Seychelles Seychelles Seychelles TRUE Seychelles
Somalia 500.74 Somalia Somalia Somalia TRUE Somalia
Swazilan 400.17 Swaziland Swaziland Swaziland TRUE Swaziland
Togo 395.9 Togo Togo Togo TRUE Togo
Ukraine 1947.94 Ukraine Ukraine Ukraine TRUE Ukraine
Australi 1971.39 Australia Australia Australia TRUE Australia
United S 1792.33 United States United Republic of Tanzania United States FALSE United States bizarre
New Zeal 2061.12 New Zealand New Zealand New Zealand TRUE New Zealand
Canada 1958.73 Canada Canada Canada TRUE Canada
Japan 2176.04 Japan Japan Japan TRUE Japan
Hong Kon 2674.63 Hong Kong Hong Kong Hong Kong TRUE Hong Kong
Korea 2399.11 Korea Korea Democratic People’s Republic of Korea FALSE Korea prefers subset + substitution over exact
Israel 1539.63 Israel Israel Israel TRUE Israel
Singapor 2459.04 Singapore Singapore Singapore TRUE Singapore
Qatar 1733.83 Qatar Qatar Qatar TRUE Qatar
Hungary 2432.96 Hungary Hungary Hungary TRUE Hungary
Bahrain 971.99 Bahrain Bahrain Bahrain TRUE Bahrain
Chile 2279.52 Chile Chile Chile TRUE Chile
Argentin 1994.59 Argentina Argentina Argentina TRUE Argentina
Barbados 468.02 Barbados Barbados Barbados TRUE Barbados
Uruguay 1918.61 Uruguay Uruguay Uruguay TRUE Uruguay
Cuba 1370.91 Cuba Aruba Cuba FALSE Cuba prefers subset + substitution over exact
Saudi Ar 1468.3 Saudi Arabia Saudi Arabia Saudi Arabia TRUE Saudi Arabia
Mexico 2024.64 Mexico Mexico Mexico TRUE Mexico
Malaysia 1922.77 Malaysia Malaysia Malaysia TRUE Malaysia
Trinidad 1024.1 Trinidad and Tobago Trinidad and Tobago Trinidad and Tobago TRUE Trinidad and Tobago
Kuwait 1081.15 Kuwait Kuwait Kuwait TRUE Kuwait
Lebanon 1543.46 Lebanon Lebanon Lebanon TRUE Lebanon
Venezuel 1280.81 Venezuela, Bolivarian Republic of Venezuela, Bolivarian Republic of Venezuela, Bolivarian Republic of TRUE Venezuela, Bolivarian Republic of
Mauritiu 1792.48 Mauritius Mauritius Mauritius TRUE Mauritius
Jamaica 595.5 Jamaica Jamaica Jamaica TRUE Jamaica
Peru 2096.08 Peru Hviderusland Peru FALSE Peru prefers subset + substitution over exact
Dominica 521.08 Dominica Dominica Dominica TRUE Dominica
SaintLuc 497.7 TRUE Saint Lucia
Ecuador 2228.58 Ecuador Ecuador Ecuador TRUE Ecuador
Brazil 1875.81 Brazil Brazil Brazil TRUE Brazil
SaintVin 395.9 TRUE Saint Vincent
Colombia 1973.6 Colombia Colombia Colombia TRUE Colombia
Iran 1945.07 Iran France Iran FALSE Iran prefers subset + substitution over exact
Tonga 2390.38 Tonga Tonga Tonga TRUE Tonga
Turkey 2167.95 Turkey Turkey Turkey TRUE Turkey
Belize 1481.26 Belize Belize Belize TRUE Belize
Tunisia 203.38 Tunisia Tunisia Tunisia TRUE Tunisia
Jordan 1539.63 Jordan Jordan Jordan TRUE Jordan
SriLanka 1783.84 TRUE Sri Lanka
DomRep 1206.72 TRUE Dominican Republic
W. Samoa 2388.58 W. Samoa W. Samoa W. Samoa TRUE W. Samoa
Fiji 2534.15 Fiji Fiji Fiji TRUE Fiji
China 2646.26 China China China TRUE China
Thailand 2068.81 Thailand Thailand Thailand TRUE Thailand
Surinam 1562.55 Suriname Suriname Suriname TRUE Suriname
Paraguay 2243.61 Paraguay Paraguay Paraguay TRUE Paraguay
Bolivia 2410.22 Bolivia Bolivia, Plurinational State of Bolivia FALSE Bolivia prefers longer
Philipin 2628.84 Philipines Philipines Philipines TRUE Philipines
Egypt 1401.52 Egypt Egypt Egypt TRUE Egypt
Syria 1590.05 Syria Syria Syria TRUE Syria
Honduras 1979.74 Honduras Honduras Honduras TRUE Honduras
Indonesi 2602.63 Indonesia Indonesia Indonesia TRUE Indonesia
VietNam 2264.3 Viet Nam Viet Nam FALSE Viet Nam correct, but odd, vs. Vietnam
Morocco 191.55 Morocco Morocco Morocco TRUE Morocco
Guatemal 2040.41 Guatemala Guatemala Guatemala TRUE Guatemala
Irak 1625.4 Irak Iran Irak FALSE Irak prefers substitution over exact
India 1888.5 India India India TRUE India
Laos 3012.42 Laos Lao People’s Democratic Republic Laos FALSE Laos prefers longer
Pakistan 1901.47 Pakistan Pakistan Pakistan TRUE Pakistan
Madagasc 1678.96 Madagascar Madagascar Madagascar TRUE Madagascar
Papua 3115.88 Papua New Guinea Papua New Guinea FALSE Papua New Guinea correct
Yemen 1190.13 Yemen Yemen Yemen TRUE Yemen
Nepal 2030.11 Nepal Nepal Nepal TRUE Nepal
CookIsla 2437.7 TRUE Cook Islands
Macau 2660.44 Macau Macao Macau FALSE Macau prefers substitution over exact
Marianas 2437.7 Mariana Isl. Mariana Isl. FALSE Mariana Isl. correct
Marshall 2437.7 Marshall Islands Marshall Islands Marshall Islands TRUE Marshall Islands
NCaledon 2437.7 TRUE New Caledonia
Taiwan 2673.71 Taiwan Taiwan, Province of China Taiwan FALSE Taiwan prefers longer
PuertoRi 1654.5 TRUE Puerto Rico
Afghanis 1962.74 Afghanistan Afghanistan Afghanistan TRUE Afghanistan
Algeria 185.65 Algeria Algeria Algeria TRUE Algeria
Antigua/ 491.84 Antigua and Barbuda Antigua and Barbuda FALSE Antigua and Barbuda correct
Azerbaij 2190.35 Azerbaijan Azerbaijan Azerbaijan TRUE Azerbaijan
Bahamas 594.17 Bahamas Bahamas Bahamas TRUE Bahamas
Banglade 1897.24 Bangladesh Bangladesh Bangladesh TRUE Bangladesh
Bhutan 2082.28 Bhutan Bhutan Bhutan TRUE Bhutan
Brunei 1904.48 Brunei Brunei Darussalam Brunei FALSE Brunei prefers longer
Burma 2138.54 Burma Burma Burma TRUE Burma
Cambodia 2254.37 Cambodia Cambodia Cambodia TRUE Cambodia
Costa Ri 1938.1 Costa Rica Costa Rica Costa Rica TRUE Costa Rica
El Salva 1016.14 El Salvador El Salvador El Salvador TRUE El Salvador
Grenada 537.25 Grenada Grenada Grenada TRUE Grenada
Guyana 1379.76 Guyana Guyana Guyana TRUE Guyana
Haiti 434.51 Haiti Haiti Haiti TRUE Haiti
Kazakhst 2122.18 Kazakhstan Kazakhstan Kazakhstan TRUE Kazakhstan
Kiribati 2281.44 Kiribati Kiribati Kiribati TRUE Kiribati
Korea (N 2399.11 Korea North Korea North FALSE Korea North correct
Kyrgysta 2143.13 TRUE Kyrgyzstan
Libya 185.65 Libya Libano Libya FALSE Libya prefers subset, deletion, insertion over exact
Maldives 1836.17 Maldives Maldives Maldives TRUE Maldives
Micrones 2437.7 Micronesia, Federated States of Micronesia, Federated States of Micronesia, Federated States of TRUE Micronesia, Federated States of
Mongolia 2542.15 Mongolia Mongolia Mongolia TRUE Mongolia
Nicaragu 1856.28 Nicaragua Nicaragua Nicaragua TRUE Nicaragua
Oman 1594.25 Oman Cayman Islands Oman FALSE Oman prefers subset + substitution over exact
Panama 1809.12 Panama Panama Panama TRUE Panama
SKittsNe 469.61 TRUE Saint Kitts and Nevis
Solomon 3050.76 Solomon Islands Solomon Islands FALSE Solomon Islands
Tajikist 2000.53 Tajikistan Tajikistan Tajikistan TRUE Tajikistan
TimorLes 2602.63 TRUE Timor–Leste
Turkmeni 2212.49 Turkmenistan Turkmenistan Turkmenistan TRUE Turkmenistan
UArabEm 1286.35 TRUE United Arab Emirates
Uzbekist 2193.47 Uzbekistan Uzbekistan Uzbekistan TRUE Uzbekistan
Vanuatu 2385.83 Vanuatu Vanuatu Vanuatu TRUE Vanuatu


The R code:

gn = read.csv("genetic_distance.csv", encoding = "UTF-8", stringsAsFactors = F)
gn$abbrev = as_abbrev(gn$Country)

trans = read.csv("countrycodes.csv", sep=";", encoding = "UTF-8", stringsAsFactors = F)
trans$shorter = str_sub(trans$Names, 1, 8)

intersect(trans$shorter, gn$Country)

matches = data.frame(source_names = gn$Country,
                     to_short = pmatch (gn$Country, trans$shorter)

agrep(gn$Country[4], trans$shorter)

best_matches = matrix(nrow = nrow(gn))
for (idx in seq_along(gn$Country)) {
  match_idx = agrep(gn$Country[idx], trans$shorter, max.distance = .1, useBytes = T)
  #skip on no match
  if (length(match_idx) == 0) next
  #insert match
  best_matches[idx] = match_idx

matches$agrep = best_matches

matches$to_short_result = trans[matches$to_short, "Names"]
matches$agrep_result = trans[matches$agrep, "Names"]

for (idx in 1:nrow(matches)) {
  if (![idx, "to_short_result"])) {
    matches[idx, "best_match"] = matches[idx, "to_short_result"]
  if ([idx, "to_short_result"])) {
    matches[idx, "best_match"] = matches[idx, "agrep_result"]

write.table(matches, "clipboard", na = "", sep = "\t")

November 13, 2014

Orthographic reform and psycholinguistics: a selective synthesizing review

Filed under: Linguistics/language — Tags: , , — Emil O. W. Kirkegaard @ 08:21

I wrote this for a class some time ago, but apparently forgot to post it here before. I need to cite it in my bachelor’s thesis, so I will put it here.

Psycholinguistics and orthography reform

November 11, 2014

Review: Writing Systems: An Introduction to Their Linguistic Analysis

Filed under: Linguistics/language — Tags: , , — Emil O. W. Kirkegaard @ 08:58

I read this book as part of background reading for my bachelor (which im writing here) after seeing it referred to in a few other books. As a textbook it seems fine, except for the chapter dealing with psycholinguistics. Nearly all the references in this section are clearly dated, and the author is not up to speed.

Some quotes and comments.

Over time, the gap between spelling and pronunciation is bound to widen
in alphabetic orthographies, as spoken forms change and written forms are retained.
Many of the so-called ‘silent’ letters in French can be explained in this way. Catach
(1978: 65) states that 12.83 per cent of letters are mute letters in French, that is,
letters that have no phonetic interpretation whatever.

Imagine how much money and time has been spent on typing silent letters. Several hundred years of typing 13% more letters than necessary. 13% more paper use. Remember when books were actually expensive.

14 ways of writing u in English

A neat little overview. English is probably unique in this degree of linguistic insanity.


Perhaps that’s where the name of the danish letter J comes from (jʌð). I always wondered.

We are like sailors who must rebuild their boat on the open sea without ever
being able to take it apart in a dock and reassemble it from scratch. -Otto Neurath

I have seen this one before, but i couldnt verify it via Wikiquote while writing this (on laptop).

The conflicting views about the role of phonological recoding in flu-
ent reading are mirrored in a long-standing controversy that pervades reading
teaching methods. On one hand, the phonics and decoding method views read-
ing as a process that converts written forms of language to speech forms and
then to meaning. A teaching method, consequently, should emphasize phonolog-
ical knowledge. As one leading proponent of the phonics/decoding approach puts
it, ‘phonological skills are not merely concomitants or by-products of reading
ability; they are true antecedents that may account for up to 60 per cent of the
variance in children’s reading ability’ (Mann 1991: 130). On the other hand, the
whole-word method sees reading as a form of communication that consists of the
reception of information through the written form, the recovery of meaning being
the essential purpose. ‘Since it is the case that learning to recognize whole words
is necessary to be a fluent reader, therefore, the learning of whole words right
from the start may be easier and more effective’ (Steinberg, Nagata and Aline
2001: 97).

This sounds like another case of social scientists identifying g without realizing it. Phonological awareness surely correlates with g.;2-%23/abstract

This study shows that a factor analysis of 4 WAIS subtests + a phonological awareness test. PA had a loading on g of .61.

See also: and

Although a general correlation between literacy rate and prosperity can be ob-
served, relatively poor countries with high literacy rates, such as Vietnam and Sri Lanka, and very rich countries with residual illiteracy, such as the United States, do exist.

This pattern is easily explainable if one knows that the national g of Vietnam and Sri Lanka is around world average, while the US is much higher. The high illiteracy rate of the US is becus of their minority populations of hispanics and african americans.

January 13, 2014

Review of Introduction to psycholinguistics, Understanding Language Science by MJ Traxler

Filed under: Linguistics/language,Psychology — Tags: — Emil O. W. Kirkegaard @ 15:54

Overall an interesting introduction. Some chapters were much more interesting to me than others, which were somewhere between kinda boring and boring. Generally, the book is way too light on the statistical features of the studies cited. When I hear of a purportedly great study, I want to know the sample size and the significance of the results.



Why does Pirahã lack recursion? Everett’s (2008) answer is that Pirahã lacks recursion

because recursion introduces statements into a language that do not make direct assertions

about the world. When you say, Give me the nails that Dan bought, that statement presupposes

that it is true that Dan bought the nails, but it does not say so outright. In Pirahã, each of the

individual sentences is a direct statement or assertion about the world. “Give me the nails”

is a command equivalent to “I want the nails” (an assertion about the speaker’s mental state).

“Dan bought the nails” is a direct assertion of fact, again expressing the speaker’s mental

state (“I know Dan bought those nails”). “They are the same” is a further statement of fact.

Everett describes the Pirahã as being a very literal-minded people. They have no creation

myths. They do not tell fictional stories. They do not believe assertions made by others

about past events unless the speaker has direct knowledge of the events, or knows someone

who does. As a result, they are very resistant to conversion to Christianity, or any other faith

that requires belief in things unseen. Everett argues that these cultural principles determine

the form of Pirahã grammar. Specifically, because the Pirahã place great store in first-hand

knowledge, sentences in the language must be assertions. Nested statements, like relative

clauses, require presuppositions (rather than assertions) and are therefore ruled out. If

Everett is right about this, then Pirahã grammar is shaped by Pirahã culture. The form their

language takes is shaped by their cultural values and the way they relate to one another

socially. If this is so, then Everett’s study of Pirahã grammar would overturn much of the

received wisdom on where grammars come from and why they take the form they do.

Which leads us to …


Interesting hypothesis.



In an attempt to gather further evidence regarding these possibilities, Savage-Rumbaugh

raised a chimp named Panpanzeeand a bonobo named Panbanisha, starting when they

were infants, in a language-rich environment. Chimpanzees are the closest species to

humans. The last common ancestor of humans and chimpanzees lived between about

5 million and 8 million years ago. Bonobos are physically similar to chimpanzees, although

bonobos are a bit smaller on average. Bonobos asa group also have social characteristics

that distinguish them from chimpanzees. Theytend to show less intra-species aggression

and are less dominated by male members of the species.9Despite the physical similarities,

the two species are biologically distinct. By testing both a chimpanzee and a bonobo, SavageRumbaugh could hold environmental factors constant while observing change over time

(ontogeny) and differences across the two species (phylogeny). If the two animals acquired

the same degree of language skill, this would suggest that cultural or environmental factors

have the greatest influence on their language development. Differences between them

would most likely reflect phylogenetic biological differences between the two species.

Differences in skill over time would most likely reflect ontogenetic or maturational factors.


The author clearly forgets about individual differences in ability. They are also found in monkeys (indeed, any animal which has g as a polygenetic trait).



The fossil record shows that human ancestors before Homo sapiensemerged, between

about 70,000 and 200,000 years ago, had some of the cultural and physical characteristics of

modern humans, including making tools and cooking food. If we assume that modern

language emerged sometime during the Homo sapiensera, then it would be nice to know

why it emerged then, and not before. One possibility is that a general increase in brain size

relative to body weight in Homo sapiensled to an increase in general intelligence, and this

increase in general intelligence triggered a language revolution. On this account, big brain

comes first and language emerges later. This hypothesis leaves a number of questions

unanswered, however, such as, what was that big brain doing before language emerged? If

the answer is “not that much,” then why was large brain size maintained in the species

(especially when you consider that the brain demands a huge proportion of the body’s

resources)? And if language is an optional feature of big, sapiensbrains, why is it a universal

characteristic among all living humans? Also, why do some groups of humans who have

smaller sized brains nonetheless have fully developed language abilities?


Interesting that he doesn’t cite a reference for this claim, or the brain size general intelligence claim from earlier. I’m more wondering, tho, is there really no difference in language sofistication between groups with different brain sizes (and g)? I’m thinking of spoken language. Perhaps it’s time to revise that claim. We know that g has huge effects on people’s vocabulary size (vocabulary size is one of the most g-loaded subtests) which has to do with both spoken and written language. However, the grammers and morfologies of many languages found in low-g countries are indeed very sofisticated.



As a result of concerns like those raised by Pullum, as well as studies showing that

speakers of different languages perceive the world similarly, many language scientists have

viewed linguistic determinism as being dead on arrival (see, e.g., Pinker, 1994). Many of

them would argue that language serves thought, rather than dictating to it. If we ask the

question, what is language good for? one of the most obvious answers is that language

allows us to communicate our thoughts to other people. That being the case, we would

expect language to adapt to the needs of thought, rather than the other way around. If an

individual or a culture discovers something new to say, the language will expand to fit the

new idea (as opposed to preventing the new idea from being hatched, as the Whorfian

hypothesis suggests). This anti-Whorfian position does enjoy a certain degree of support

from the vocabularies of different languages, and different subcultures within individual

languages. For example, the class of words that refer to objects and events (open class)

changes rapidly in cultures where there is rapid technological or social changes (such as

most Western cultures). The word internetdid not exist when I was in college, mumble

mumble years ago. The word Googledid not exist 10 years ago. When it first came into the

language, it was a noun referring to a particular web-browser. Soon after, it became a verb

that meant “to search the internet for information.” In this case, technological, cultural, and

social developments caused the language to change. Thought drove language. But did

language also drive thought? Certainly. If you hear people saying “Google,” you are going to

want to know what they mean. You are likely to engage with other speakers of your language

until this new concept becomes clear to you. Members of subcultures, such as birdwatchers

or dog breeders, have many specialist terms that make their communication more efficient,

but there is no reason to believe that you need to know the names for different types of birds

before you can perceive the differences between them—a bufflehead looks different than a

pintail no matter what they’re called.


The author apparently gets his etymology wrong. Google was never a verb for a browser, that’s something computer-illiterate people think, cf.


No… Google’s Chrome is a browser. Google is a search engine. And “to google” something means to search for it using Google, the search engine. Altho now it has changed somewhat to mean “search the internet for”. Similarly to the Kleenex meaning change.



Different languages express numbers in different ways, so language could influence the

way children in a given culture acquire number concepts (Hunt & Agnoli, 1991; Miller &

Stigler, 1987). Chinese number words differ from English and some other languages (e.g.,

Russian) because the number words for 11–19 are more transparent in Chinese than in

English. In particular, Chinese number words for the teens are the equivalent of “ten-one,”

“ten-two,” “ten-three” and so forth. This makes the relationship between the teens and the

single digits more obvious than equivalent English terms, such as twelve. As a result, children

who speak Chinese learn to count through the teens faster than children who speak English.

This greater accuracy at producing number words leads to greater accuracy when children

are given sets of objects and are asked to say how many objects are in the set. Chinesespeaking children performed this task more accurately than their English-speaking peers,

largely because they made very few errors in producing number words while counting up

the objects. One way to interpret these results is to propose that the Chinese language makes

certain relationships more obvious (that numbers come in groups of ten; that there’s a

relationship between different numbers that end in the word “one”), and making those

relationships more obvious makes the counting system easier to learn.22


This hypothesis is certainly plausible, but the average g difference between chinese children and american children is a confound that needs to be dealt with.


It would be interesting to see a scandinavian comparison because danish has a horrible numeral system, while swedish has a better one. They both have awkward teen numbers, but the e2 numbers are much more meaningful in swedish: e.g. fem-tio, five-ten vs. halv-tres, half-tres, a remnant from a system based on 20’s, it’s half-three*20 = 2.5*20=50.



So how are word meanings (senses, that is) represented in the mental lexicon? And what

research tools are appropriate to investigating word representations? One approach to

investigating word meaning relies on introspection—thinking about word meanings and

drawing conclusions from subjective experience. It seems plausible, based on introspection,

that entries in the mental lexicon are close analogs to dictionary entries. If so, the lexical

representation of a given word would incorporate information about its grammatical

function (what category does it belong to, verb, noun, adjective, etc.), which determines how

it can combine with other words (adverbs go with verbs, adjectives with nouns). Using

words in this sense involves the assumption that individual words refer to types—that the

core meaning of a word is a pointer to a completely interchangeable set of objects in the

world (Gabora, Rosch, & Aerts, 2008). Each individual example of a category is a token. So,

teamis a type, and Yankees, Twins, and Mudhensare tokens of that type.2


He has misunderstood the type-token terminology. I will quite SEP:


1.1 What the Distinction Is

The distinction between a type and its tokens is an ontological one between a general sort of thing and its particular concrete instances (to put it in an intuitive and preliminary way). So for example consider the number of words in the Gertrude Stein line from her poem Sacred Emily on the page in front of the reader’s eyes:

Rose is a rose is a rose is a rose.

In one sense of ‘word’ we may count three different words; in another sense we may count ten different words. C. S. Peirce (1931-58, sec. 4.537) called words in the first sense “types” and words in the second sense “tokens”. Types are generally said to be abstract and unique; tokens are concrete particulars, composed of ink, pixels of light (or the suitably circumscribed lack thereof) on a computer screen, electronic strings of dots and dashes, smoke signals, hand signals, sound waves, etc. A study of the ratio of written types to spoken types found that there are twice as many word types in written Swedish as in spoken Swedish (Allwood, 1998). If a pediatrician asks how many words the toddler has uttered and is told “three hundred”, she might well enquire “word types or word tokens?” because the former answer indicates a prodigy. A headline that reads “From the Andes to Epcot, the Adventures of an 8,000 year old Bean” might elicit “Is that a bean type or a bean token?”.


He seems to be talking about members of sets.


Or maybe not, perhaps his usage is just idiosyncratic, for in the note “2” to the above, he writes:


Teamitself can be a token of a more general category, like organization(team, company, army). Typeand token

are used differently in the speech production literature. There, tokenis often used to refer to a single instance

of a spoken word; typeis used to refer to the abstract representation of the word that presumably comes into

play every time an individual produces that word




We could lo ok at a cor pus and count up ever y t ime t he word dogsappears in exactly that

form. We could count up the number of times that catsappears in precisely that form. In

that case we would be measuring surface frequency—how often the exact word occurs. But

the words dogsand catsare both related to other words that share the same root morpheme.

We could decide t o ignore minor differences in s ur face for m and ins t ead concent r at e on

how often the family of related words appears. If so, we would treat dog, dogs, dog-tired, and

dogpileas being a single large class, and we would count up the number of times any member

of the class appears in the corpus. In that case, we would be measuring rootfrequency—how

often the shared word root appears in the language. Those two ways of counting frequency

can come up with very different estimates. For example, perhaps the exact word dogappears

very often, but do-pileappears very infrequently. If we base our frequency estimate on

surface frequency, dogpileis very infrequent. But if we use root frequency instead, dogpileis

very frequent, because it is in the class of words that share the root dog, which appears

fairly often.


If we use these different frequency estimates (surface frequency and root frequency) to

predict how long it will take people to respond on a reaction time task, root frequency

makes better predictions than surface frequency does. A word that has a low surface

frequency will be responded to quickly if its root frequency is high (Bradley, 1979; Taft, 1979,

1994). This outcome is predicted by an account like FOBS that says that word forms are

accessed via their roots, and not by models like logogen where each individual word form

has a separate entry in the mental lexicon.


Further evidence for the morphological decomposition hypothesis comes from priming

studies involving words with real and pseudo-affixes. Many polymorphemic words are

created when derivational affixes are added to a root. So, we can take the verb growand

turn it into a noun by adding the derivational suffix -er. A groweris someone who grows

things. There are a lot of words that end in -erand have a similar syllabic structure to

grower, but that are not real polymorphemic words. For example, sisterlooks a bit like

grower. They both end in -erand they both have a single syllable that precedes -er.

According to the FOBS model, we have to get rid of the affixes before we can identify the

root. So, anything that looks or sounds like it has a suffix is going to be treated like it really

does have a suffix, even when it doesn’t. Even though sisteris a monomorphemic word, the

lexical access process breaks it down into a pseudo- (fake) root, sist, and a pseudo-suffix, -er.

After the affix strippingprocess has had a turn at breaking down sisterinto a root and a

suffix, the lexical access system will try to find a bin that matches the pseudo-root sist. This

process will fail, because there is no root morpheme in English that matches the input sist.

In that case, the lexical access system will have to re-search the lexicon using the entire

word sister. This extra process should take extra time, therefore the affix stripping

hypothesis predicts that pseudo-suffixedwords (like sister) should take longer to process

than words that have a real suffix (like grower). This prediction has been confirmed in a

number of reaction time studies—people do have a harder time recognizing pseudosuffixed words than words with real suffixes (Lima, 1987; Smith & Sterling, 1982; Taft,

1981). People also have more trouble rejecting pseudo-words that are made up of a prefix

(e.g., de) and a real root morpheme (e.g., juvenate) than a comparable pseudo-word that

contains a prefix and a non-root (e.g., pertoire). This suggests that morphological

decomposition successfully accesses a bin in the dejuvenatecase, and people are able to

rule out dejuvenateas a real word only after the entire bin has been fully searched (Taft &

Forster, 1975). Morphological structure may also play a role in word learning. When people

are exposed to novel words that are made up of real morphemes, such as genvive(related

to the morpheme vive, as in revive) they rate that stimulus as being a better English word

and they recognize it better than an equally complex stimulus that does not incorporate a

familiar root (such as gencule) (Dorfman, 1994, 1999).


In which case english is a pretty bad language as it tends not to re-use roots. Ex. “garlic” vs. danish “hvidløg” (white-onion), or “edible” vs. “eatable” (eat-able). Esparanto shud do pretty good on a comparison.



When comprehenders demonstrate sensitivity to subcategory preference information

(the fact that some structures are easier to process than others when a sentence contains a

particular verb), they are behaving in ways that are consistent with the tuning hypothesis.

The tuning hypothesis says, “that structural ambiguities are resolved on the basis of stored

records relating to the prevalence of the resolution of comparable ambiguities in the past”

(Mitchell, Cuetos, Corley, & Brysbaert, 1995, p. 470; see also Bates & MacWhinney, 1987;

Ford, Bresnan, & Kaplan, 1982; MacDonald et al., 1994). In other words, people keep track

of how often they encounter different syntactic structures, and when they are uncertain

about how a particular string of words should be structured, they use this stored information

to rank the different possibilities. In the case of subcategory preference information, the

frequencies of different structures are tied to specific words—verbs in this case. The next

section will consider the possibility that frequencies are tied to more complicated

configurations of words, rather than to individual words.


This seems like a plausible account of why practicing can boost reading speed.



The other way that propositionis defined in construction–integration theory is, “The

smallest unit of meaning that can be assigned a truth value.” Anything smaller than that is

a predicate or an argument. Anything bigger than that is a macroproposition. So, wroteis a

predicate, and wrote the companyis a predicate and one of its arguments. Neither is

a proposition, because neither can be assigned a truth value. That is, it doesn’t make sense

to ask, “True or false: wrote the company?” But it does make sense to ask, “True or false: The

customer wrote the company?” To answer that question, you would consult some

representation of the real or an imaginary world, and the statement would either accurately

describe the state of affairs in that world (i.e., it would be true) or it would not (i.e., it would

be false).


Although the precise mental mechanisms that are involved in converting the surface

form to a set of propositions have not been worked out, and there is considerable debate

about the specifics of propositional representation (see, e.g., Kintsch, 1998; Perfetti & Britt,

1995), a number of experimental studies have supported the idea that propositions are a

real element of comprehenders’ mental representations of texts (van Dijk & Kintsch, 1983).

In other words, propositions are psychologically real—there really are propositions in the

head. For example, Ratcliff and McKoon (1978) used priming methods to find out how

comprehenders’ memories for texts are organized. There are a number of possibilities. It

could be that comprehenders’ memories are organized to capture pretty much the verbatim

information that the text conveyed. In that case, we would expect that information that is

nearby in the verbatim form of the text would be very tightly connected in the comprehender’s

memory of that text. So, for example, if you had a sentence like (2) (from Ratcliff & McKoon,



(2) The geese crossed the horizon as the wind shuffled the clouds.


the words horizonand windare pretty close together, as they are separated by only two short

function words. If the comprehender’s memory of the sentence is based on remembering it

as it appeared on the page, then horizonshould be a pretty good retrieval cue for wind(and

vice versa).


If we analyze sentence (2) as a set of propositions, however, we would make a different

prediction. Sentence (2) represents two connected propositions, because there are two

predicates, crossedand shuffled. If we built a propositional representation of sentence (2),

we would have a macroproposition(a proposition that is itself made up of other propositions),

and two micropropositions(propositions that combine to make up macropropositions). The

macroproposition is:


as (Proposition 1, Proposition 2)


The micropropositions are:


Proposition 1: crossed [geese, the horizon]

Proposition 2: shuffled [the wind, the clouds]


Notice that the propositional representation of sentence (2) has horizonin one proposition,

and windin another. According to construction–integration theory, all of the elements of

that go together to make a proposition should be more tightly connected in memory to each

other than to anything else in the sentence. As a result, two words from the same proposition

should make better retrieval cues than two words from different propositions. Those

predictions can be tested by asking subjects to read sentences like (2), do a distractor task

for a while, and then write down what they can remember about the sentences later on. On

each trial, one of the words from the sentence will be used as a retrieval cue or reminder. So,

before we ask the subject to remember sentence (2), we will give her a hint. The hint

(retrieval cue) might be a word from proposition 1 (like horizon) or a word from proposition

2 (like clouds), and the dependent measure would be the likelihood that the participant will

remember a word from the second proposition (like wind). Roger Ratcliff and Gail McKoon

found that words that came from the same proposition were much better retrieval cues

(participants were more likely to remember the target word) than words from different

propositions, even when distance in the verbatim form was controlled. In other words, it

does not help that much to be close to the target word in the verbatim form of the sentence

unless the reminder word is also from the same proposition as the target word (see also

Wanner, 1975; Weisberg, 1969).


Im surprised to see empirical evidence for this, but it is very neat when science does that – converge on the same result from two different angles (in this case metafysics and linguistics).


As for his micro, macroproposition terminology, normally logicians call these compound/non-atomic and atomic propositions.



How does suppression work? Is it as automatic as enhancement? There are a number of

reasons to think that suppression is not just a mirror image of enhancement. First,

suppression takes a lot longer to work than enhancement does. Second, while knowledge

activation (enhancement) occurs about the same way for everyone, not everyone is equally

good at suppressing irrelevant information, and this appears to be a major contributor to

differences in comprehension ability between different people (Gernsbacher, 1993;

Gernsbacher & Faust, 1991; Gernsbacher et al., 1990). For example, Gernsbacher and her

colleagues acquired Verbal SAT scores for a large sample of students at the University of

Oregon (similar experiments have been done on Air Force recruits in basic training, who

are about the same age as the college students). Verbal SAT scores give a pretty good

indication of how well people are able to understand texts that they read, and there are

considerable differences between the highest and lowest scoring people in the sample. This

group of students was then asked to judge whether target words like acewere semantically

related to a preceding sentence like (15), above. Figure 5.4 presents representative data from

one of these experiments. The left-hand bars show that the acemeaning was highly activated

for both good comprehenders (the dark bars) and poorer comprehenders (the light bars)

immediately after the sentence. After a delay of one second (a very long time in language

processing terms), the good comprehenders had suppressed the contextually inappropriate

“playing card” meaning of spade, but the poor comprehenders still had that meaning

activated (shown in the right-hand bars of Figure 5.4).



Very neat! Didn’t know about this, but it fits very nicely in the ECT (elementary cognitive test) tradition of Jensen. I shud probably review this evidence and publish a review in Journal of Intelligence.




To determine whether something is a cause, comprehenders apply the necessity in the

circumstances heuristic (which is based on the causal analysis of the philosopher Hegel).

The necessity in the circumstances heuristic says that “A causes B, if, in the circumstances

of the story, B would not have occurred if A had not occurred, and if A is sufficient for B to



Sounds more like a logical fallacy, i.e. denying the antecedent:

1. A→B
2. ¬A
Thus, 3. ¬B


The importance of causal structure in the mental processing of texts can be demonstrated

in a variety of ways. First, the propositional structure of texts can be described as a network

of causal connections. Some of the propositions in a story will be on the central causal chain

that runs from the first proposition in the story (Once upon a time …) to the last (… and

they lived happily ever after). Other propositions will be on causal dead-ends or side-plots.

In Cinderella, her wanting to go to the ball, the arrival of the fairy godmother, the loss of the


Discourse Processing

glass slipper, and the eventual marriage to the handsome prince, are all on the central causal

chain. Many of the versions of the Cinderella story do not bother to say what happens to the

evil stepmother and stepsisters after Cinderella gets married. Those events are off the

central causal chain and, no matter how they are resolved, they do not affect the central

causal chain. As a result, if non-central events are explicitly included in the story, they are

not remembered as well as more causally central elements (Fletcher, 1986; Fletcher &

Bloom, 1988; Fletcher et al., 1990).


A nice model for memetic evolution.

Korean Air had a big problem (Kirk, 2002). Their planes were dropping out of

the sky like ducks during hunting season. They had the worst safety record of

any major airline. Worried company executives ordered a top-to-bottom

review of company policies and practices to find out what was causing all the

crashes. An obvious culprit would be faulty aircraft or bad maintenace

practices. But their review showed that Korean Air’s aircraft were well

maintained and mechanically sound. So what was the problem? It turned out

that the way members of the flight crew talked to one another was a major

contributing factor in several air disasters. As with many airlines, Korean Air

co-pilots were generally junior to the pilots they flew with. Co-pilots’

responsibilities included, among other things, helping the pilot monitor the

flight instruments and communicating with the pilot when a problem occurred,

including when the pilot might be making an error flying the plane. But in the

wider Korean culture, younger people treat older people with great deference

and respect, and this social norm influences the way younger and older people

talk to one another. Younger people tend to defer to older people and feel

uncomfortable challenging their judgment or pointing out when they are

about to fly a jet into the side of a mountain. In the air, co-pilots were waiting

too long to point out pilot errors, and when they did voice their concerns, their

communication style, influenced by a lifetime of cultural conditioning, made it

more difficult for pilots to realize when something was seriously wrong. To

correct this problem, pilots and co-pilots had to re-learn how to talk to one

another. Pilots needed to learn to pay closer attention when co-pilots voiced

their opinions, and co-pilots had to learn to be more direct and assertive when

communicating with pilots. After instituting these and other changes, Korean

Air’s safety record improved and they stopped losing planes.


This one is probably not true. No source given either. Perhaps just a case of statistical regression towards the mean. Any airline that does bad for chance reasons will tend to recover.

To date, exp er iment s on s t at is t ical lear ning in infant s have b een bas ed on highly

simplified mini-languages with very rigid statistical properties. For example, transitional

probabilities between syllables are set to 1.0 for “words” in the language, and .33 for pairs of

syllables that cut across “word” boundaries.17Natural languages have a much wider range of

transitional probabilities between syllables, the vast majority of which are far lower than

1.0. Researchers have used mathematical models to simulate learning of natural languages,

using samples of real infant-directed speech to train the simulated learner (Yang, 2004).

When the model has to rely on transitional probabilities alone, it fails to segment speech

accurately. However, when the model makes two simple assumptions about prosody—that

each word has a single stressed syllable, and that the prevailing pattern for bisyllables is

trochaic (STRONG–weak)—the model is about as accurate in its segmentation decisions as

7½-month-old infants. This result casts doubt on whether the statistical learning strategy is

sufficient for infants to learn how to segment naturally occurring speech (and if the strategy

is not sufficient, it can not be necessary either).

More logic errors? Let’s translate the talk of sufficient and necessary conditions into logic:

A is a sufficient condition for B, is the same as, A→B

A is a necessary condition for B, is the same as, B→A

The claim that A is not sufficient for B, then A is not necessary for B, is thus the same as: ¬(A→B)→¬(B→A). Clearly not true.

So, if a child already knows the name of a concept, she will reject a second label as referring

to the same concept. Children can use this principle to figure out the meanings of new

words, because applying the principle of contrast rules out possible meanings. If you

already know that gavagaimeans “rabbit,” and your guide points at a rabbit and says, blicket,

you will not assume that gavagaiand blicketare synonyms. Instead, you will consider the

possibility that blicketrefers to a salient part of the rabbit (its ears, perhaps) or a type of

rabbit or some other salient property of rabbits (that they’re cute, maybe). In the lab,

children who are taught two new names while attending to an unfamiliar object interpret

the first name as referring to the entire object and the second name as referring to a salient

part of the object. For somewhat older children (3–4 years old), parents often provide an

explicit contrast when introducing children to new words that label parts of an object

(Saylor, Sabbagh, & Baldwin, 2002). So, an adult might point to Flopsy and say, See

the bunny? These are his ears. Children do not need such explicit instruction, however,

as they appear to spontaneously apply the principle of contrast to deduce meanings for

subcomponents of objects (e.g., ears) and substances that objects are made out of (e.g.,

wood, naugahyde, duck tape).

Dogs (some of them) apparently can also do this.

When Chinese was thought of as a pictographic script, it made sense to think that

Chinese script might be processed much differently than English script. But it turns out

that there are many similarities in how the two scripts are processed. For one thing, reading

both scripts leads to the rapid and automatic activation of phonological (sound) codes.

When we read English, we use groups of letters to activate phonological codes automatically

(this is one of the sources of the inner voicethat you often hear when you read). The fact

that phonological codes are automatically activated in English reading is shown by

experiments involving semantic categorization tasks where people have to judge whether a

word is a member of a category. Heterophonic(multiple pronunciations) homographs(one

spelling), such as wind, take longer to read than comparably long and frequent regular

words, because reading windactivates two phonological representations (as in the wind was

blowingvs. wind up the clock) (Folk & Morris, 1995). A related consistency effect involves

words that have spelling patterns that have multiple pronunciations. The word havecontains

the letter “a,” which in this case is pronounced as a “short” /a/ sound. But most of the time

-aveis pronounced with the “long” a sound, as in cave, and save. So, the words have, cave,

and save, are said to be inconsistentbecause the same string of letters can have multiple

pronunciations. Words of this type take longer to read than words that have entirely

consistent letter–pronunciation patterns (Glushko, 1979), and the extra reading time

reflects the costs associated with selecting the correct phonological code from a number of

automatically activated candidates.

Some potential good reasons for a ‘shallow’ (i.e. good) orthografy here! A bad spelling system is literally causing us to take longer to read, not just to learn to read.

Phonemic awareness is an important precursor of literacy (the ability to read and write).

It is thought to play a causal role in reading success, because differences in phonemic

awareness can be measured in children who have not yet begun to read. Those prereaders’

phonemic awareness test scores then predict how successfully and how quickly they will

master reading skills two or three years down the line when they begin to read (Torgesen

et al., 1999, 2001; Wagner & Torgesen, 1987; Wagner, Torgesen, & Rashotte, 1994;

Wagner et al., 1997; see Wagner, Piasta, & Torgesen, 2006, for a review; but see Castles &

Coltheart, 2004, for a different perspective). Phonemic awareness can be assessed in a

variety of ways, including the elision, sound categorization, and blendingtasks (Torgesen

et al., 1999), among others, but the best assessments of phonemic awareness involve multiple

measures. In the elision task, children are given a word such as catand asked what it would

sound like if you got rid of the /k/sound. Sound categorization involves listening to sets of

words, such as pin, bun, fun, and gun, and identifying the word “that does not sound like the

others” (in this case, pin; Torgesen et al., 1999, p. 76). In blending tasks, children hear an

onset (word beginning) and a rime (vowel and consonant sound at the end of a syllable),

and say what they would sound like when they are put together. Children’s composite scores

on tests of phonemic awareness are strongly correlated with the development of reading

skill at later points in time. Children who are less phonemically aware will experience

greater difficulty learning to read, but effective interventions have been developed to

enhance children’s phonemic awareness, and hence toincrease the likelihood that they will

acquire reading skill within the normal time frame (Ehri, Nunes, Willows, et al., 2001).18

These shud work as early IQ tests.

And it does (even if this is a weak paper):

There are different kinds of neighborhoods, and the kind of neighborhood a word

inhabits affects how easy it is to read that word. Different orthographic neighborhoods are

described as being consistent or inconsistent, based on how the different words in the

neighborhood are pronounced. If they are all pronounced alike, then the neighborhood is

consistent. If some words in the neighborhood are pronounced one way, and others are

pronounced another way, then the neighborhoodis inconsistent. The neighborhood that

madeinhabits is consistent, because all of the other members of the neighborhood (wade,

fade, etc.) are pronounced with the long /a/ sound. On the other hand, hint lives in an

inconsistent neighborhood because some of the neighbors are pronounced with the short

/i/ sound (mint, lint, tint), but some are pronounced with the long /i/ sound (pint). Words

from inconsistent neighborhoods take longer to pronounce than words from consistent

neighborhoods, and this effect extends to non-words as well (Glushko, 1979; see also Jared,

McRae, & Seidenberg, 1990; Seidenberg, Plaut, Petersen, McClelland, & McRae, 1994). So,

it takes you less time to say tadethan it takes you to say bint. Why would this be?

Bad spelling even makes us speak slower…

The single-route models would seem to enjoy a parsimony advantage, since they can

produce frequency and regularity effects, as well as their interaction, on the basis of a single

mechanism.25However, recent studies have indicated that the exact position in a word that

leads to inconsistent spelling–sound mappings affects how quickly the word can be read

aloud. As noted above, it takes longer to read a word with an inconsistency at the beginning

(e.g., general, where hard /g/ as in goatis more common) than a word with an inconsistency

at the end (e.g., bomb, where the bis silent). This may be more consistent with the DRC

serial mapping of letters to sounds than the parallel activation posited by PDP-style singleroute models (Coltheart & Rastle, 1994; Cortese, 1998; Rastle & Coltheart, 1999b; Roberts,

Rastle, Coltheart, & Besner, 2003).

In practical terms, this means that we shud begin with words that have problematic beginnings and endings. Words like “mnemonic” and “psychology”.

Treat ment opt ions for aphasia include pharmacological t herapy (dr ugs) and various

forms of speech therapy.18Let’s review pharmacological therapy before turning to speech

therapy. One of the main problems that happens following strokes is that damage to the

blood vessels in the brain reduces the blood flow to perisylvian brain regions, and

hypometabolism—less than normal activity—in those regions likely contributes to aphasic

symptoms. Therefore, some pharmacological treatments focus on increasing the blood

supply to the brain, and those treatments have been shown to be effective in some studies

(Kessler, Thiel, Karbe, & Heiss, 2001). The periodimmediately following the stroke appears

to be critical in terms of intervening to preserve function. For example, aphasia symptoms

can be alleviated by drugs that increase blood pressure if they are administered very rapidly

when the stroke occurs (Wise, Sutter, & Burkholder, 1972). During this period, aphasic

symptoms will reappear if blood pressure is allowed to fall, even if the patient’s blood

pressure is not abnormally low. In later stages of recovery, blood pressure can be reduced

without causing the aphasic symptoms to reappear. Other treatment options capitalize on

the fact that the brain has some ability to reorganize itself following an injury (this ability is

called neural plasticity). It turns out that stimulant drugs, including amphetamines, appear

to magnify or boost brain reorganization. When stimulants are taken in the period

immediately following a stroke, and patients are also given speech-language therapy, their

language function improves more than control patients who receive speech-language

therapy and a placebo in the six months after their strokes (Walker-Batson et al., 2001).

Very interesting application of amfetamins.

October 2, 2013

Theory of mind and reasoning complexity (paper for some linguistics class)

Filed under: Linguistics/language,Logic — Emil O. W. Kirkegaard @ 16:29

The assignment was:

Any aspect? :D I just wrote stuff about formal logic. So no more research was needed. Lucky.

SMU paper 1

April 6, 2013

Comments on Linguistic Anthropology (Laura Ahearn)

Filed under: Linguistics/language — Emil O. W. Kirkegaard @ 22:59

Consider Marx’s famous words in “The Eighteenth Brumaire o f Louis

Bonaparte” : “Men make their own history, but they do not make it

just as they please; they do not make it under circumstances chosen by

themselves, but under circumstances directly found, given and trans­

mitted from the past” (Marx 1978[1852]:595). In place o f the word

“history” in this remark, one could easily substitute “ language,” “soci­

ety,” or “ culture,” and the statement would remain equally insightful.

At the core o f what is known as “practice theory” is this seeming

paradox: that language, culture, and society all apparently have a pre­

existing reality but at the same time are very much the products ot

individual humans’ words and actions.12 Many linguistic anthropolo­

gists explicitly or implicitly draw upon practice theory in their work.

Correct. Equally insightless.

In sum, as important as the interview is as a research method, it is

often mistakenly assumed to provide a simple, straightforward path

toward “ the facts” or “the truth.” Interviews can indeed provide rich

insights, but they must be appreciated as the complex, culturally

mediated social interactions that they are.

I cringe every time I read ”the truth” and ”the facts”. Social constructivism -_-

A researcher interested in language ideologies might conduct a

matched guise test, a process that involves recording individuals as

they read a short passage in two or more languages or dialects

(“guises”). In other words, if four people are recorded, eight (or more)

readings o f the same passage might be produced. For example, a

researcher interested in whether listeners judge people who speak

African American English differently from those who speak standard

American English might choose four individuals who can code-switch

fluently between these two ways o f speaking. Each o f these four

individuals would record two readings o f the same passage, one in

African American English, the other in standard American English.

These eight readings would then be shuffled up and played back to

other people who do not know that there were only four readers

instead o f eight. The listeners would be asked to rank each o f the eight

readings, rating each according to how honest, intelligent, sophisti­

cated, likable, and so on, they thought the reader was. By comparing

the scores listeners give to the same speaker reading in African

American English vs. standard American English, it is possible to hold

a person’s other voice qualities constant and thereby determine how

much influence simply speaking one or the other o f these language

variants has on listeners’ attitudes toward the speaker. In other words,

matched guise tests can provide a measure o f people’s unconscious

language ideologies – which can be related to racial prejudices.6

It is telling that the author uses ”prejudices” instead of, say, ”beliefs”. Since it is well known that american blacks ARE less intelligent, and that there is a certain dialect used mostly by black americans, this the usage of this dialect can hence be used as a diagnostic tool for identifying american blacks. This in turn makes it a useful proxy for low intelligence (white american standards). Indeed, not using the information for that purpose if one knows about these correlations, would be to ignore relevant data.

The message to scholars interested in language acquisition, therefore, is

that they should consider cultural values and social practices to be

inseparable from language and its acquisition (Slobin 1992:6). And the

message to cultural anthropologists and other social scientists interested

in processes o f childhood social practices, education, apprenticeship, or

other ways o f learning or entering into new social groups is that they

should look closely at linguistic practices. In other words, learning a first

language and becoming a culturally competent member o f a society are

two facets ot a single process. It is virtually impossible for a child to learn

a language without also becoming socialized into a particular cultural

group, and, conversely, a child cannot become a competent member o f

such a group without mastering the appropriate linguistic practices.

What about learning foreign languages? Especially dead foreign languages. Or constructed languages? Does one become a member of the nonexistent Klingon soceity if one learns that as a child? They must have some other way of thinking about this, if these obvious counter-examples do not work.

Franz Boas (1858-1942) is often considered the father o f anthropology

in the United States. An important part o f Boas’s research agenda

involved disproving racist assertions about the existence o f so-called

“primitive” languages, races, and cultures. At the turn o f the twentieth

century, when Boas was writing, some scholars were arguing that

people in certain societies were incapable o f complex, abstract, “scien­

tific” thought because o f the seeming lack o f “logical” grammatical

categories in their languages. Boas, who was keen on demonstrating

the essential equality and humanity o f all people despite their tremen­

dous linguistic and cultural diversity, disputed this interpretation,

proposing instead that all linguistic and cultural practices were equally

complex and logical. The particular language spoken by a group o f

people merely tended to reflect their habitual cultural practices, Boas

maintained. Language might facilitate certain types o f thinking and

could provide a valuable way o f understanding unconscious patterns

o f culture and thought, Boas declared, but it would not prevent people

from thinking in a way that differed from the categories presented

most conveniently in their language.

I found it difficult to believe that there is nothing to this general idea. I expect there to be some correlations between population IQ and their language. And just trivial things like that indo-european and chinese languages are associated with high IQ. Something like that high IQ is associated with some measure of the advancedness of the language in question. But perhaps it’s not true. In any case, I don’t presume to know to begin with and am willing to look at the data. Apparently, this wasn’t true for Boas.

Another possible way o f researching the influence o f language-in-

general on thought is studying children who have not yet learned a

language. Clearly, it would be highly unethical to deprive a child o f

access to a language; furthermore, studies o f abused children who have

not been exposed to any language involve so many complicating fac­

tors that the causes o f cognitive differences are impossible to ascertain.

Researchers interested in the effects o f language-in-general on human

thought have therefore turned to subjects such as very young, prelin-

guistic infants, or deaf children who are raised in normal circum­

stances but who have been deprived o f early exposure to language

because they have hearing parents who do not use sign language. In

the case o f infants, as noted in chapter 3, the language socialization

process begins from day one (if not before), so it is impossible to study

a truly “prelinguistic” infant. […]

It does begin before, at least, so claims this TED talk I saw a while back.

Much research remains to be conducted before a definitive under­

standing of the potential effects o f language-in-general on various

dimensions o f thought can be obtained. It may even turn out to be the

case that there is no such general effect, since no one actually learns

“language-in-general” but instead learns one (or more) particular lan­

guage. In this regard, additional research is needed to explore the timing

of theory o f mind development in children who speak languages other

than English. There are some studies o f Baka- and Japanese-speaking

children, among others, indicating that they are able to pass the stand­

ard false-belief tasks at the same age as English-speaking children, but

other children, such as those who speak Junin Quechua, seem not to

be able to pass the classic false-belief tasks until much later, perhaps

because o f the specific grammatical structures o f Junin Quechua or a

very different cultural context (Villiers and Villiers 2003:372—373).

Many linguistic anthropologists question whether standard experi­

ments devised in the United States can be exported, either in their

original form or in “culturally appropriate” versions, to be used with

children (or even adults) from very different linguistic and cultural

backgrounds. At the very least, what little research there is o f this sort

must be closely scrutinized for cultural and linguistic bias.

Knowing that the japanese are similar to whites in intelligence, and not knowing the intelligence of the people speaking the mentioned language, this immediately gives one the idea that it might be an intelligence thing. The crucial test for that is whether false-belief tests correlate with intelligence.

Nothing useful on Wikipedia.

Did a brief search on GScholar, with terms: false-belief task, IQ. Result? IQ does predict better scores on false-belief tests. Cites:

  • Hughes, Claire, et al. “Good test‐retest reliability for standard and advanced false‐belief tasks across a wide range of abilities.” Journal of Child Psychology and Psychiatry 41.4 (2000): 483-490.
  • Brüne, Martin. “Theory of mind and the role of IQ in chronic disorganized schizophrenia.” Schizophrenia Research 60.1 (2003): 57-64.
  • Happé, Francesca GE. “Wechsler IQ profile and theory of mind in autism: a research note.” Journal of Child Psychology and Psychiatry 35.8 (1994): 1461-1471.

The group seems to be this one:

Lynn lists Peru’s population IQ at 90. So, this explanation might fit. Or it might not. Difficult to say about some specific subgroup of that population. Presumably, the indegenious peoples have lower IQ due to lesser admixture of white genes.

Think o f all the taken-for-granted ways in which reading and writing

saturate our daily lives. Even if we put aside schooling, the most obvi­

ous realm in which literacy plays a central role, an average day in the

life o f a person living in the United States or any number o f other

countries in the twenty-first century will most likely involve more

interactions with written texts than can be counted. “ [M]ost social

interactions in contemporary society,” David Barton and Mary

Hamilton proclaim, “ are textually mediated” (Barton and Hamilton

2005:14). From cereal boxes, billboards, and newspapers to the inter­

net and words written on clothing, many people engage more fre­

quently with the written word than they realize. And even when

people are alone while reading and writing, they are engaged in social

activities because reading and writing are enacted and interpreted in

culturally and socially specific ways. Moreover, these activities are also

bound up with social differences and inequalities. Patricia Baquedano-

Lopez writes: “Literacy is less a set o f acquired skills and more an

activity that affords the acquisition and negotiation o f new ways o f

thinking and acting in the world” (2004:246). And since the social

world is not composed o f neutral, power-free interactions, Janies Gee

notes that we should therefore not expect this to be true o f literacy

practices: “The traditional meaning of the word ‘literacy’— the ‘ability

to read and write ’ — appears ‘innocent’ and ‘obvious.’ But, it is no such

thing. Literacy as ‘the ability to read and write ’ situates literacy in the

individual person, rather than in society. As such, it obscures the

multiple ways in which literacy interrelates with the workings of

power” (Gee 2008:31).

Garbage like this is found consistently throughout the book.

Junigau women’s literacy practices did not just facilitate a shift away

from arranged marriage toward elopement, therefore, but also reflected

and helped to shape the new ways in which villagers thought o f

themselves. Along with these changes, however, came some rein­

forcement o f pre-existing norms, especially in the area o f gender rela­

tions. While it might seem to readers used to having the right to

choose their own spouse that acquiring such a right would inevitably

improve someone’s life, in fact, the opposite was true for some Junigau

women who eloped after love-letter correspondences. In cases where

their husbands or in-laws turned out to be abusive, the women found

that they had no recourse and no support from their own parents.

If they had encountered these kinds o f problems after an arranged

marriage, most could have returned to their parents’home or expected

their parents to intervene on their behalf. Such was not the case tor

most women who had eloped. Indeed, because most o f these women

ended up moving into their husbands’ extended households as low-

status daughters-in-law, their social positioning and daily lives were

virtually identical to those o f women whose marriages had been

arranged – except that they did not have the same recourse if things

went poorly In some respects, therefore, the women’s new literacy

practices created new and different opportunities and identities, but in

other respects, long-standing gender inequalities remained or were

even exacerbated.

Interesting, even if sad.

An alternative source o f theoretical illumination for literacy

researchers, according to James Collins and Richard Blot (2003), is

French post-structuralist thought. Pierre Bourdieu,Michel de Certeau,

Jacques Derrida, and Michel Foucault all provide important analyses

o f the workings o f power in society in ways that are especially apt for

scholars interested in studying reading and writing. Drawing on these

theorists, Collins and Blot attempt to provide something they argue

has been lacking in NLS: “ an account o f power-in-literacy which

captures the intricate ways in which power, knowledge, and forms o f

subjectivity are interconnected with ‘uses o f literacy’ in modern

national, colonial, and postcolonial settings” (2003:66). Lewis et al.

(2007) draw upon some o f these post-structuralist theorists as well as

others to create a “ critical sociocultural theory” by focusing on con­

cepts such as. “activity,’’“history” and “communities o f practice,” which

they claim help literacy scholars to incorporate a better understanding

o f identity, agency, and power into their research.

Oh no. Not more of this garbage.

The challenge o f identifying the many possible interpretations and

emergent possibilities o f any given performance – or, indeed, any

social interaction — has been a central issue in some o f my own

research. In particular, I became intrigued by a specific woman’s festival

in Nepal known asTij. From my first experiences o f the yearly festival

in the early 1980s when I was a Peace Corps volunteer in the Nepali

village ot Junigau through my subsequent stints o f research there once

I became an anthropologist,Tij has always been o f interest.The festival

is based on Hindu rituals for married women that require them to

pray for the long lives o f their husbands (and even pray that they die

before their husbands). The rituals also require women to atone for

having possibly caused men to become ritually polluted by touching

them while the women were menstruating or recovering from child­

birth. In Junigau, however, the celebration ofTij goes far beyond these

rituals, extending weeks in advance and involving feasts for female

relatives and many formal and informal songfests at which women

sing, men play the drums, and both women and men dance, some­

times even together.


Mehl and his colleagues conducted a study of almost 400 college

students – the study mentioned at the outset o f this chapter – in order

to measure gender differences in the average number o f words spoken

over the course o f the research subjects’ waking hours (Mehl et al.

2007).The college students (divided roughly equally between women

and men) were rigged up with digital recorders that were programmed

to record for 30 seconds every 12.5 minutes. The students could not

tell when they were being recorded. The researchers then transcribed

all the words spoken by the participants and extrapolated from these

figures to estimate the total number o f words spoken over the course

o f an average day for these individuals. The findings showed that

female college students spoke an average o f 16,215 words per day,

while men spoke an average o f 15,669 words per day – but this dif­

ference was not statistically significant. “Thus,” write Mehl and his

co-authors, “the data fail to reveal a reliable sex difference in daily

word use. Women and men both use on average about 16,000 words

per day, with very large individual differences around this mean . .. We

therefore conclude, on the basis o f available empirical evidence, that

the widespread and highly publicized stereotype about female talka­

tiveness is unfounded” (Mehl et al. 2007:82).

In the source referenced to just prior to this Language Log is mentioned a study about the talkativeness of the sexes, which found that females used 45% more words.

I tried to find some more recent studies on Google Scholar, but didn’t find anything useful. Wrong key words?

It the realities o f language and gender are really so complex and varied,

however, why are the language ideologies concerning female talka­

tiveness or male verbal competitiveness that can be found in the

vignettes presented by Tannen (1990) and others so recognizable

to us? Cameron (2007b) explains that it happens because o f the

tendency o f all people to rely at least in part on stereotyping when

processing information. It is not just ignorant or prejudiced people

who stereotype, Cameron states, but everyone because stereotyping

provides us with convenient shortcuts in determining what people

are like and how we should treat them.The downside, however, is that

such stereotypes “can reinforce unjust prejudices, and make us prone

to seeing only what we expect or want to see” (Cameron 2007b: 14).

When we see someone who fits our preconceptions – say, a woman

who is extremely talkative, for example – we easily “supply the cultural

script that makes them meaningful a n d ‘typical’” (Cameron 1997:48).

When we encounter someone who does not fit a particular stereo­

type, however, we tend either not to notice or to explain the case

away as an aberration.

Why should we care i f one or more o f our gendered language

ideologies might be inaccurate or at least overly simplistic? There are

many real-world implications o f inaccurate language ideologies — in

the workplace, in family life, in court cases, and in interpersonal

relationships. Women, men, and children all suffer when gendered

assumptions regarding communicative styles and identities are inac­

curate or overly rigid. What the research described in this chapter

clearly demonstrates is that complexity and variability best character­

ize the relationship o f language to gender. We will come to a similar

conclusion in the next chapter after exploring the ways in which

language relates to race and ethnicity.

They are also useful in remembering base rates and making correct judgments. Cf. Jussim, Lee, et al. “10 The Unbearable Accuracy of Stereotypes.” Handbook of prejudice, stereotyping, and discrimination (2009): 199.

Defining Race and Ethnicity

Many misconceptions surround the concept of race. Jane Hill, a

well-known linguistic anthropologist and the former President o f

the American Anthropological Association, maintains that most

white Americans share a largely inaccurate “ folk th eo ry ” ot race and

racism, one o f the main components o f which is a belief in “race” as

a basic category o f human biological variation, combined with a

belief that each human being can be assigned to a race, or some­

times to a mixture o f races (Hill 2008:6—7). Hill argues that this folk

theory is widespread and taken for granted – but mistaken in most

respects, according to the vast majority o f anthropologists and other

social scientists. Indeed, the official statement on race o f the

American Anthropological Association begins with these two


In the United States both scholars and the general public have been

conditioned to viewing human races as natural and separate divisions

within the human species based on visible physical differences.

With the vast expansion of scientific knowledge in this century,

however, it has become clear that human populations are not unam­

biguous, clearly demarcated, biologically distinct groups. Evidence

from the analysis of genetics (e.g., DNA) indicates that most physical

variation, about 94%, lies within so-called racial groups. Conventional

geographic “racial” groupings differ from one another only in about

6% of their genes. This means that there is greater variation within

“racial” groups than between them. In neighboring populations

there is much overlapping of genes and their phenotypic (physical)

expressions.Throughout history whenever different groups have come

into contact, they have interbred. The continued sharing of genetic

materials has maintained all of humankind as a single species.

Physical variations in any given trait tend to occur gradually rather

than abruptly over geographic areas. And because physical traits are

inherited independently of one another, knowing the range of one

trait does not predict the presence of others. For example, skin color

varies largely from light in the temperate areas in the north to dark in

the tropical areas in the south; its intensity is not related to nose shape

or hair texture. Dark skin may be associated with frizzy or kinky hair

or curly or wavy or straight hair, all of which are found among different

indigenous peoples in tropical regions. These facts render any attempt

to establish lines of division among biological populations both

arbitrary and subjective.

As definitive as the AAA’s statement is about the lack o f a consistent

biological basis for the concept o f race, it should not be read as argu­

ing that race does not exist. Race is clearly an important social cate­

gory that influences people’s life trajectories and identities. Many

scholars in fact view it as a, or even the, central organizing principle

in the United States. But the social fact o f race does not support the

folk theory described by Hill above.2 Reflect for a moment upon

the following paradox: because o f the so-called “one-drop rule,” a

white woman in the United States can give birth to a black child, but

a black woman cannot give birth to white child. Such reflection

should lead to an appreciation for the social foundations o f the con­

cept o f race (Ignatiev 1995:1).

This one was bound to happen. The usual socialconstructivism.

I refer to

Edwards, Anthony WF. “Human genetic diversity: Lewontin’s fallacy.” BioEssays 25.8 (2003): 798-801.’s-fallacy.pdf

As usual, these socialconstructivists attack strawman accounts of race. Who believes in an essentialist, clearly separate account of human races? No one. It’s biology, clear bounderies are a rarefind. :)

At one point in the history o f the United States, for example, many

groups now unquestioningly considered “w h i te ” were initially not

included in this privileged category.3 Benjamin Franklin, for example,

wrote in the eighteenth century that Swedes and Germans were

“swarthy,” and he did not include them among the “white people,”

who consisted, according to Franklin, solely o f the English and the

Saxons. “This example,” Jane Hill comments, “shows how what seem

to us today like fundamental perceptions may be o f very recent his­

torical origin . .. Contemporary White Americans can no longer see

‘swarthiness’ among Swedes, and find it astonishing that anyone ever

did so” (Hill 2008:14).

Never heard of this one. But it seems true.

24. Which leads me to add one Remark: That the Number of purely   white People in the World is proportionably very small. All Africa is   black or tawny. Asia chiefly tawny. America (exclusive of the new   Comers) wholly so. And in Europe, the Spaniards, Italians,   French, Russians and Swedes, are generally of what we call   a swarthy Complexion; as are the Germans also, the Saxons only   excepted, who with the English, make the principal Body of White People   on the Face of the Earth. I could wish their Numbers were increased. And while   we are, as I may call it, Scouring our Planet, by clearing   America of Woods, and so making this Side of our Globe reflect a   brighter Light to the Eyes of Inhabitants in Mars or Venus, why should   we in the Sight of Superior Beings, darken its People? why increase the Sons of   Africa, by Planting them in America, where we have so fair an   Opportunity, by excluding all Blacks and Tawneys, of increasing the lovely   White and Red? But perhaps I am partial to the Complexion of my Country, for   such Kind of Partiality is natural to Mankind.

Good old racism. In reality the Swedes are very white, and the British are partly Swedes due to Viking settlements…

Gene tests can surely confirm this, if they haven’t already done so.

The parameters and nuances o f racial classifications in countries

other than the United States have been studied by anthropologists and

other social scientists for many years. In Brazil, for example, scholarly

debates have focused on the meanings o f multiple Brazilian racial

categories that intersect in complicated ways with class, gender, and

sexuality.4 In Nepal, the country I know best ethnographically, there

is nothing like the black—white binary commonly attributed to the

United States, and until recently, the concept o f “race” was not men­

tioned in public debates at all. Instead, caste, ethnicity, and religion

have been the most salient forms o f social differentiation for Nepalis.

During the 1990s, however, a group o f activists from various Tibeto-

Burman ethnic groups drew upon outdated social science research

from the last century to posit three main races in the world (Hangen

2005, 2009). Susan Hangen, an anthropologist who has conducted

fieldwork on this topic in Nepal, reports that a politician in eastern

Nepal stated the following during one o f his speeches in 1997:

We are a M on go l community, we are n o t a caste either; we are Mongol .

For example, in this world there are three types o f people. O n e is

w h i te w i th w h i te skin like Americans, for example like sister here

[referring to me] . . . T h e o th e r has black skin and is called N e g ro .T h e

o th e r is called the red race like us: sh ort like us; stocky like us; with

small eyes and flat noses like us. (2005:49)

L5y invoking this outdated tripartite racial classification, the politician

was attempting to unite a number o f linguistically and culturally

diverse ethnic groups, such as Rais, Magars, Limbus, Gurungs, and

Sherpas, under the umbrella o f one political party, the Mongol

National Organization (MNO). The hope was that unifying these

disparate but similarly disadvantaged groups would help them oppose

Nepal’s high-caste Hindu ruling groups. One person told Hangen,

“We didn’t know that we were Mongols until the M N O came here”

(2005:49). Hangen’s research is a fascinating example o f the com­

plexities, contradictions, and cross-cultural differences involved in the

concept ot race.

Actually those three are the three superclusters found using modern methods and not a all wrong. They are however less informative than are the lesser clusters, say, the 10 clusters identified by Sforza (1994). Depending on how much data one has, and how much detail one wants, one can find a larger number of clusters, aka. races.

Bonnie Urciuoli approaches the process o f ethnicization differently

in her research on Puerto Ricans in New York City, contrasting

ethnicization with racialization and situating both within the context

of class and gender identities in the United States. According to

Urciuoli (1996), racial discourses “frame group origin in natural

terms.” Ethnic discourses, in contrast, “frame group origin in cultural

terms” (1996:15). Racialized people, Urciuoli writes, are considered

out of place; they are dirty, dangerous, and unwilling or unable to

participate constructively in the nation-state. In contrast, the cultural

differences said to be characteristic o f ethnicized people are consid­

ered safe, ordered, and “ a contribution to the nation-state offered by

striving immigrants making their way up the ladder o f class mobility”

(1996:16). Within this landscape o f social inequality and exclusion,

Urciuoli states that language differences are often racialized.That is, an

inability to speak English, or an inability to speak English “without an

accent” (cf. Lippi-Green 1997), marks someone as disorderly and

unlikely to experience social mobility – as someone, in other words,

who does not fully belong in the United States.

But the asians are doing just fine and speak with an accent. Likewise with other high IQ immigrants.

Some people argue that using two negatives is “illogical” because

two negatives is a positive according to formal logic or mathematical

principles. But if this were so, then the use o f three negatives, as in the

sentence, “ I can’t get n o th in ’ from nobody,” would go back to being

a negative and would no longer “violate” these principles. Clearly,

this sentence would be as objectionable as ones with only two nega­

tives to the prescriptivists who want to impose the grammatical rules

o f one dialect o f English (the standard one) on all other dialects.

While there may be many good reasons for preferring standard

English over other dialects o f English in certain instances, neverthe­

less, as Labov (1972a) famously demonstrated decades ago in his

classic article, “The Logic o f Nonstandard English,” logic and gram-

maticality are not among them. The preference o f one dialect over

another is one based on social, political, or economic factors – it

cannot be based on linguistic factors because all dialects are equally

logical and grammatical.

Nonsense. Some languages are more logical than others. The obvious case being lojban which is directly translateable to predicate logic.

In any case, the author seems to have no good understanding of formal logic, as she uses confusing simplistic terms. The sentence she uses as an example: I can’t get nothin’ from nobody.

I can’t get nothing from nobody.

I can get something from nobody.

I can’t get anything from anybody.

These are all equivalent in standard predicate logic.

¬(∃x)¬(∃y)¬CanGetFrom(I, x, y)

substitute ¬(∃x) for (∀x)¬

(∀x)¬¬(∃y)¬CanGetFrom(I, x, y)

Double negation elimination

(∀x)(∃y)¬CanGetFrom(I, x, y)

For any x, there is an y such that it is not the case that I can get x from y.

In other words, for every person, there is something I can’t get. I can’t get anything from anybody.

That’s using the internal negation interpretation. Using external negation, the situation is easier, and that is left for the reader as an exercise in logic. :)

Turning to the second question about how or whether AAE should

be used in schools to facilitate the acquisition among AAE-speaking

students o f the standard dialect o f English, it is important to note the

serious educational crisis that the Oakland Board o f Education was

trying to address (however ineffectively or controversially) in its

December 1996 resolution. As John Rickford (2005) reminds us, the

Oakland school district was not alone in experiencing extremely high

rates o f failure and drop-out among its African American population.

O th e r school districts throughout the United States faced similar

disparities in school performance at the time – and still do today.

The question remains how to address these educational disparities.

Although this issue is far beyond the scope o f this book, involving as

it does complex issues o f poverty, racial discrimination, and residential

segregation, among other possible contributing factors, the extent to

which speaking a nonstandard, stigmatized linguistic variant such as

AAE contributes to school problems deserves to be studied further

(cf. Labov 2010; Rickford 2005).

It is called intelligence.

Aside from the obvious racist slurs, what constitutes racist language?

Jane Hill (2008) argues that the language ideologies that are dominant

in the United States, combined with a widespread American folk

theory o f race, combine to ensure that the everyday talk produced by

average white, middle-class Americans and distributed in respected

media “ continues to produce and reproduce Whi te racism” (2008:47).

Far from being an element o f the past. Hill maintains, racism “is a vital

and formative presence in American lives, resulting in h ur t and pain

to individuals, to glaring injustice, in the grossly unequal distribution

o f resources along racially stratified lines, and in strange and damaging

errors and omissions in public policy both domestic and foreign”

(2008:47-48). And this racism, Hill suggests, is largely produced in

and through everyday talk – not through the obvious racist slurs that

most people today condemn (though these o f course contribute), but

through unintentional, indirect uses ot language that reinforce racist


Ah, the racism theory of blacks problems. Obviously doesn’t work due to the fact that blacks in African countries perform likewise badly. And they have done so for the last 100 years, so far back as we have data.

Cf. Jensen’s discussion in The g Factor.

In a similar set o f experiments, Rubin (1992) and Rubin and Smith

(1990) conducted matched guise tests with undergraduates (Hill

2008:12). All their research participants heard the same four-minute

tape-recorded lecture featuring a woman who was a native speaker ot

English, but half o f the students were shown a slide o f a white woman

while they listened to the lecture and were told that this was the

speaker, while the other half were shown a slide o f an East Asian

woman. The students in the latter group tended to report that the

speaker had a foreign accent, and they even did significantly worse on

a comprehension quiz on the material in the lecture — even though

these students had heard exactly the same lecture as the students who

were shown the photo o f a white woman while they listened to the

lecture! Clearly, racial categories and racialized language ideologies can

influence perceptions even without our being aware o f the process.

That sounds interesting. Inb4 small sample size and publication bias.

The cites are:

Rubin, D.L. (1992) Nonlanguage factors affecting undergraduates’ judgments of nonnative English-speaking teaching assistants. Research in Higher Education 33:51 1—53 I .

Rubin, D.L. and Smith, K.A. (1990) Effects of accent, ethnicity, and lecture

topic onundergraduates’ perceptions of non-native English-speaking teaching assistants. International Journal of Intercultural Relations 14:


I looked into the newest one, from 1992. It had a sample size of 62 (with apparently, self-selection before that). And it reported non-significant results for the things the author of the book claims. Color me not impressed, although interesting study. The results did tend to go in the direction the author claims, but they had a huge variance.’-judgments-of-nonnative-English-speaking-teaching-assistants..pdf

What are the problematic assumptions underlying the desire to

count the number ot endangered languages, and the number o f speak­

ers each endangered language has? Jane Hill (2002:127-128;

cf. Duchene and Heller 2007) names several. First, although she

acknowledges that numbers can be powerful “ calls to action” that have

been used to mobilize activists to reverse the trend toward language

death, and although Hill herselt has been involved in such efforts, she

warns that journalists and the mass media are soundbite oriented and

cannot or will not devote enough time or space to explaining the dif­

ficulties or subtleties involved in quantifying languages or speakers.

Second, Hill warns that numbers and statistics that are meant for one

kind of audience — speakers of dominant languages, perhaps, who have

the power to do something about the extinction o f smaller languages

— can have very negative effects when heard by a very different kind o f

audience – the speakers o f endangered languages themselves. Hill

reminds her readers that numbers have often been used by colonial

powers in the past as one means o f control, what Foucault would call

governmentality through enumeration. Speakers o f endangered lan­

guages are often fearful, she warns, that numbers can be (and have been)

held against them, and they can therefore become fearful or resentful.


K. David Harrison, another linguist who works on endangered lan­

guages all over the world, lists three areas o f loss if we fail to safeguard

and document languages at risk o f extinction: (1) the erosion o f

the human knowledge base, especially local ecological knowledge;

(2) the loss o f cultural heritage; and (3) failure to acquire a full under­

standing o f human cognitive capacities (2007:15-19). With regard to

the first area o f loss, Harrison notes that an estimated 87 percent o f

the world’s plants and animals have not yet been identified or studied

by modern scientists. If we are to hope that a cure to cancer or other

horrible diseases might be found in the Amazon, or in Papua New

Guinea, or it we want to learn about more sustainable forms o f agri­

culture from people who have been living in harmony in their envi­

ronments for many hundreds o f years, then we should recognize,

Harrison writes, that “most o f what humankind knows about the

natural world lies completely outside o f science textbooks, libraries,

and databases, existing only in unwri tten languages in people’s

memories” — that is, mostly in unwri tten endangered languages

(2007:15). O f course, some o f this knowledge can be communicated

in a different language, assuming the person speaking the endangered

language is bilingual, but oftentimes there is a “massive disruption o f

the transfer o f traditional knowledge across generations” when a

group switches from an endangered language to a dominant language

(2007:16). Particular languages are often especially rich in certain

areas o f the lexicon, such as reindeer herding, botany, or fishing, that

are the most important to the speakers o f those languages, and a great

deal o f ecologically specific knowledge is encoded in that language

that goes along with those particular cultural practices. It is not sur­

prising, then, that much o f that knowledge is not passed on when the

language (and often the way o f life as well) dies.

I thought the point about loss of local knowledge was good. Although this is only relevant for useful local knowledge. Map knowledge, not useful. We have satelites. Properties of local plants. Might be very useful for medicine.

The third area o f loss Elarrison identifies is the ability to acquire a

full understanding o f the capabilities o f the human mind. Linguists

and cognitive scientists make assumptions about what the human

brain can and cannot do based on experiments and existing data. One

source o f such data is the group o f languages that have been studied

by linguists. Whenever a language is analyzed for the first time, schol­

ars look to see what patterns it shares grammatically with other lan­

guages in the world and which features it has that might be unique.

The more languages that die, the more likely it is that the conclusions

scholars draw about the limits o f human cognition might be mistaken.

For example, the language o f Urarina, which is spoken by only 3,000

people in the Amazon rainforest o f Peru, has a very unusual word

order for its sentences. Unlike English, which generally uses the

Subject -V e rb – Object (S-V-O) word order, as in sentences such as,

“The girl rode the bike,” Urarina uses the Object – Verb — Subject

(O-V-S) word order, which would have a literal translation for this

sentence as, “The bike rode the girl.” O-V-S word order is extremely

rare among the world’s languages. “Were it not for Urarina and a few

other Amazonian languages,” Harrison writes, “scientists might not

even suspect it were possible. They would be free to hypothesize —

falsely – that O-V-S word order was cognitively impossible, that the

human brain could not process i t” (2007:19).

Eh. It is obviously ‘cognitively possible’ since we just understood an English example with OVS order… Another route is just to make construct a language to test it with. Similarly for other candidates for impossibility.

Still useful, sure, but not that useful.

As a language is in the process o f dying out, it often undergoes

simplification in its grammar and lexicon. Speakers have fewer oppor­

tunities to use the language and so either forget or do not acquire a

large vocabulary. Grammatical structures can also be lost or simplified.

For example, in Dyirbal, an endangered Aboriginal language in

Australia, there used to be a four-part classification o f nouns. (See

chapter 4 for a discussion o f the four categories.) Nowadays, however,

young people are less familiar with the ancestral myths and cultural

practices that motivated the four-part classification, and they are less

fluent in Dyirbal, having attended school mostly in English, and so

they have replaced the four-part system o f noun classification with a

two-part one. It is still different from English and retains some of the

features o f the older system, hut it has become much simpler to use

(Nettle and Romaine 2000:66-69).

Now, if only all other languages would get rid of noun classes/genders… :)

The chapter on language extinction is really lacking in content. They don’t discuss the overall cause of the huge diversity of languages to begin with, why there is a lot of diversity some places, and others not. And they fail to mention one very good reason, which is indeed the primary reason to use a language at all, to have fewer languages: it makes communcating easier! The cause of diversity of languages is 1) lack of long distance communcation between groups of people. Consider it a proces similar to genetic drift. Those places where there is lots of language diversity, are exactly the kind of backward places with no decent technology to facilitate long distance communication. When we use introduce it, they need to use a different language to talk with other people, and hence switch from their now not very useful language to one more useful. Nothing mysterious here.


One o f the most useful terms for our purposes in understanding how

power intersects with language is hegemony. According to Raymond

Williams, a cultural Marxist who builds on the work o f Antonio

Gramsci, hegemony refers to a dynamic system o f domination based

not so much on violence or the threat o f violence, or merely on the

economic control o f the means o f production, but rather on political,

cultural, and institutional influence. “That is to say,” Williams writes,

“it is not limited to matters o f direct political control but seeks to

describe a more general predominance which includes, as one o f its

key features, a particular way o f seeing the world and human nature

and relationships” (1983:145). Having military power or economic

wealth can certainly lead to power, but social status and cultural dom­

inance can also come from other sources, and hegemony is a term that

helps us understand this process. Hegemony is saturated with the spe­

cific forms o f inequality belonging to particular societies at particular

historical moments, according to Williams, and is “ . . . in the strongest

sense a ‘culture’, but a culture which has also to be seen as the lived

dominance and subordination o f particular classes” (1977:110).

Emphasizing the dynamic nature o f any “ lived hegemony,” Williams

reminds us that “it does not just passively exist as a form o f domi­

nance. It has continually to be renewed, recreated, defended, and

modified. It is also continually resisted, limited, altered, challenged by

pressures not all its own” (1977:112). In other words, Williams con­

cludes, while any lived hegemony is always by definition dominant, it

is never total or exclusive (1977:113).

Oh boy here we go…

Antonio Gramsci (Italian: [anˈtɔːnjo ˈɡramʃi]; 22 January 1891 – 27 April 1937) was an Italian writer, politician, political theorist, philosopher, sociologist, and linguist. He was a founding member and onetime leader of the Communist Party of Italy and was imprisoned by Benito Mussolini‘s Fascist regime.

Gramsci was one of the most important Marxist thinkers in the 20th century. His writings are heavily concerned with the analysis of culture and political leadership and he is notable as a highly original thinker within modern European thought. He is renowned for his concept of cultural hegemony as a means of maintaining the state in a capitalist society.

In a contribution that ties in nicely with one o f this b o o k ’s key

concepts, that o f language ideologies, Bourdieu describes how differ­

ent levels o f symbolic capital can turn into symbolic dominance and even

symbolic violence. When individuals in a society are not proficient in

the most highly valued ways o f speaking (such as English in the

United States, especially Standard American English), they do not

benefit from the access such proficiency often provides to prestigious

schools, professions, or social groups (cf. Lippi-Green 1997). And yet,

speakers o f stigmatized variants (for example, in the United States

these might include speakers o f nonstandard varieties o f English

such as African American English or Appalachian English) frequently

buy into the system o f evaluation that ranks Standard American

English as superior. These people’s own language ideologies, in other

words, stigmatize the ways in which they themselves speak. This

acceptance o f differing social values accorded various ways ot speak­

ing is in actuality a misrecognition, according to Bourdieu, because the

differential levels o f prestige constitute an arbitrary ranking. Every

language or dialect is as good linguistically, even though not socially, as

every other.

It just isn’t true. Languages differ in many relevant linguistic properties. Good luck discussing advanced physics in some amerindian language with no words for the relevant physics terms. This is even the case for a large language such as Danish. This is one of the reason we see what is called domain loss – a domain of life is spoken about in a different language because no suitable terms exist in the standard language. Cf. ex.

And some are easier to learn than others, due to grammar or phonology (ex. English <th> sounds are difficult to learn).

And so on.

Why such a change in the understanding o f these languages? Irvine

and Gal argue that the answer it was not so much because o f better

scholarship or improved data but instead because, “There have also

been changes in what observers expected to see and how they inter­

preted what they saw” (2000:48). Nineteenth-century linguists and

ethnographers assumed that linguistic classifications could be used to

judge evolutionary rankings o f groups. (White Europeans were of

course at the top o f this ranking, and various African groups clustered

toward the bottom.) They also assumed that ethnic groups were

monolingual and that a “primordial relationship” existed that linked

languages with territories, nations, tribes, and peoples. In the case o f

Fula, Wolof, and Sereer, racial and linguistic ideologies led nineteenth-

century linguists to consider the Fula language and its speakers (who

were often lighter skinned than the others and who tended to espouse

a more orthodox Islam) to be o f higher status and intelligence. The

Wolof language was deemed “less supple, less handy” than Fula, and its

speakers less intelligent. The Sereer language, nineteenth-century lin­

guists claimed, was “the language o f primitive simplicity” (Irvine and

Gal 2000:55).

Never heard of them, but lightness of skin does correlate well with population intelligence world wide.

They might be smarter than their neighbours. At least, there is a list of prominent fula people.

Googling “fule people intelligent” yields 13.1e6 results.

April 2, 2013

Review + comments: Analyzing Grammar, An Introduction (Paul R. Kroeger, 2005)

Filed under: Language,Linguistics/language — Emil O. W. Kirkegaard @ 03:51

Cambridge.University.Press.Analyzing.Grammar.An.Introduction.Jun.2005 free pdf download


Overall, there is nothing much to say about this book. It covers most stuff. Neither particularly good, or interesting, or particularly bad or uninteresting, IMO.

Forexample, what is the meaning of the word hello? What information

does it convey? It is a very difficult word to define, but every speaker of

English knows how to use it: for greeting an acquaintance, answering the

telephone, etc. We might say that hello conveys the information that the

speaker wishes to acknowledge the presence of, or initiate a conversation

with, the hearer. But it would be very strange to answer the phone or greet

your best friend by saying “I wish to acknowledge your presence” or “I

wish to initiate a conversation with you.”What is important about the word

hello is not its information content (if any) but its use in social interaction.

In the Teochew language (a “dialect” of Chinese), there is no word for

‘hello’. The normal way for one friend to greet another is to ask: “Have you

already eaten or not?” The expected reply is: “I have eaten,” even if this is

not in fact true.

In our comparison of English with Teochew, we saw that both languages

employ a special formof sentence for expressing Yes–No questions. In fact,

most, if not all, languages have a special sentence pattern which is used for

asking such questions. This shows that the linguistic form of an utterance

is often closely related to its meaning and its function. On the other hand,

we noted that the grammatical features of a Yes–No question in English

are not the same as in Teochew. Different languages may use very different

grammatical devices to express the same basic concept. So understanding

the meaning and function of an utterance will not tell us everything we need

to know about its form.

interesting for me becus of my work on a logic of questions and answers.

Both of the hypotheses we have reached so far about Lotuko words are

based on the assumption that themeaning of a sentence is composed in some

regular way from the meanings of the individual words. That is, we have

been assuming that sentence meanings are compositional.Of course,

every language includes numerous expressions where this is not the case.

Idioms are one common example. The English phrase kick the bucket can

mean ‘die,’ even though none of the individual words has this meaning.

Nevertheless, the compositionality of meaning is an important aspect of the

structure of all human languages.

for more on compositionality see:

We have discussed three types of reasoning that can be used to

identify the meaningful elements of an utterance (whether parts of a word

or words in a sentence): minimal contrast, recurring partials, and pattern-

matching. In practice, when working on a new body of data, we often use

all three at once, without stopping to think which method we use for which

element. Sometimes, however, it is important to be able to state explicitly

the pattern of reasoning which we use to arrive at certain conclusions. For

example, suppose that one of our early hypotheses about the language is

contradicted by further data. We need to be able to go back and determine

what evidence that hypothesis was based on so that we can re-evaluate

that evidence in the light of additional information. This will help us to

decide whether the hypothesis can be modified to account for all the facts,

orwhether it needs to be abandoned entirely.Grammatical analysis involves

an endless process of “guess and check” – forming hypotheses, testing them

against further data, andmodifying or abandoning those which do not work.

quite a lot of science works like that. conjecture and refutation, pretty much (Popper)

What do we mean when we say that a certain form, such as Zapotec ka–,

is a “morpheme?” Charles Hockett (1958) gave a definition of this term

which is often quoted:

Morphemes are the smallest individually meaningful elements in the utter-

ances of a language.

There are two crucial aspects of this definition. First, a morpheme is mean-

ingful.A morpheme normally involves a consistent association of phono-

logical formwith some aspect ofmeaning, as seen in (7) where the form ˜ nee

was consistently associated with the concept ‘foot.’ However, this associ-

ation of form with meaning can be somewhat flexible. We will see various

ways in which the phonological shape of a morpheme may be altered to

some extent in particular environments, and there are some morphemes

whose meaning may depend partly on context.

obviously does not work for

what is the solution to this inconsistency in terminology?

In point (c) above we noted that a word which contains no plural marker

is always singular. The chart in (17) shows that the plural prefix is optional,

and that when it is present it indicates plurality; but it doesn’t say anything

about the significance of the lack of a prefix. One way to tidy up this loose

end is to assume that the grammar of the language includes a default

rule which says something like the following: “a countable noun which

contains no plural prefix is interpreted as being singular.”

Another possible way to account for the same fact is to assume that sin-

gular nouns carry an “invisible” (or null) prefix which indicates singular

number. That would mean that the number prefix is actually obligatory for

this class of noun. Under this approach, our chart would look something

like (18):

the default theory with is more plausible than positing invisible morphemes.

since the book contiues to use Malay as an ex. including the word <orang> i’m compelled to mention that it is not a coincidence that it is similar to <orangutan>.

The name “orangutan” (also written orang-utan, orang utan, orangutang, and ourang-outang) is derived from the Malay and Indonesian words orang meaning “person” and hutan meaning “forest”,[1] thus “person of the forest”.[2]Orang Hutan was originally not used to refer to apes, but to forest-dwelling humans. The Malay words used to refer specifically to the ape is maias and mawas, but it is unclear if those words refer to just orangutans, or to all apes in general. The first attestation of the word to name the Asian ape is in Jacobus Bontius‘ 1631 Historiae naturalis et medicae Indiae orientalis – he described that Malaysians had informed him the ape was able to talk, but preferred not to “lest he be compelled to labour”.[3] The word appeared in several German-language descriptions of Indonesian zoology in the 17th century. The likely origin of the word comes specifically from the Banjarese variety of Malay.[4]

The word was first attested in English in 1691 in the form orang-outang, and variants with -ng instead of -n as in the Malay original are found in many languages. This spelling (and pronunciation) has remained in use in English up to the present, but has come to be regarded as incorrect.[5][6][7] The loss of “h” in Utan and the shift from n to -ng has been taken to suggest that the term entered English through Portuguese.[4] In 1869, British naturalist Alfred Russel Wallace, co-creator of modern evolutionary theory, published his account of Malaysia’s wildlife: The Malay Archipelago: The Land of the Orang-Utan and the Bird of Paradise.[3]

Traditional definitions for parts of speech are based on “notional”

(i.e. semantic) properties such as the following:

(17) A noun is a word that names a person, place, or thing.

A verb is a word that names an action or event.

An adjective is a word that describes a state.

However, these characterizations fail to identify nouns like destruction,

theft, beauty, heaviness. They cannot distinguish between the verb love and

the adjective fond (of),or between the noun fool and the adjective foolish.

Note that there is very little semantic difference between the two sentences

in (18).

(18) They are fools.

They are foolish.

it is easy to fix 17a to include abstractions. all his counter-examples are abstractions.

<love> is both a noun and a verb, but 17 definitions, which is right.

the 18 ex. seems weak too. what about the possibility of interpreting 18b as claiming that they are foolish. this does not mean that they are fools. it may be a temporary situation (drunk perhaps), or isolated to specific areas of reality (ex. religion).

not that i’m especially happy about semantic definitions, it’s just that the argumentation above is not convincing.

Third, the head is more likely to be obligatory than the modifiers or other

non-head elements. For example, all of the elements of the subject noun

phrase in (22a) can be omitted except the head word pigs.If this word is

deleted, as in (22e), the result is ungrammatical.

(22) a [The three little pigs] eat truffles.

b [The three pigs] eat truffles.

c [The pigs] eat truffles.

d [Pigs] eat truffles.

e *[The three little] eat truffles.

not so quick. if the context makes it clear that they are speaking about pigs, or children, or whatever, 22e is perfectly understandable, since context ‘fiils out’ the missing information, grammatically speaking. but the author is right in that it is incomplete and without context to fill in, one would be forced to ask ”three little what?”. but still, that one will actually respond like this shows that the utterance was understood, at least in part.

Of course, English noun phrases do not always contain a head noun. In

certain contexts a previously mentioned head may be omitted because it is

“understood,” as in (23a). This process is called ellipsis . Moreover, in

English, and in many other languages, adjectives can sometimes be used

without any head noun to name classes of people, as in (23b,c). But, aside

from a few fairly restricted patterns like these, heads of phrases in English

tend to be obligatory.

(23) a [The third little pig] was smarter than [the second ].

b [the good], [the bad] and [the ugly]

c [The rich] get richer and [the poor] get children.

i was going to write the author doesn’t seem to understand the word ”obligatory”, but it another interpretation dawned upon me. i think he means that under must conditions, one cannot leave out the noun in a noun phrase (NP), but sometimes one can. confusing wording.

As we can already see from example (5), different predicates require

different numbers of arguments: hungry and snores require just one, loves

and slapping require two. Some predicates may not require any arguments

at all. For example, in many languages comments about the weather (e.g. It

is raining,or It is dark,or It is hot) could be expressed by a single word, a

bare predicate with no arguments.

it is worth mentioning that there is a name for this:

It is important to remember that arguments can also be optional. For exam-

ple,many transitive verbs allowan optional beneficiary argument (18a), and

most transitive verbs of the agent–patient type allow an optional instrument

argument (18b). The crucial fact is that adjuncts are always optional. So

the inference “if obligatory then argument” is valid; but the inference “if

optional then adjunct” is not.

strictly speaking, this is using the terminology incorrectly. conditionals are not inferences. the author should have written ex ”the inference “obligatory, therefore, argument” is valid.”, or alternatively ”the conditional “if obligatory, then argument” is true.”.

confusing inferences with conditionals leads to all kinds of confusions in logic.

Another way of specifying the transitivity of a verb is to ask, how many

term (subject or object) arguments does it take? The number of terms, or

direct arguments, is sometimes referred to as the valence of the verb.

Since most verbs can be said to have a subject, the valence of a verb is

normally one greater than the number of objects it takes: an intransitive

verb has a valence of one, a transitive verb has a valence of two, and a

ditransitive verb has a valence of three.

the author is just talking about how many operands the expressed predicate has. there are also verbs which can express predicates with four operands. consider <transfer>. ex. ”Peter transfers 5USD from Mike to Jim.”. There Peter, subject, agent; 5USD, object, theme, a repicient, Jim, ?; Mike, antirecpient?, ?.

The distinctions between OBJ2 and OBL make little to no sense to me.

It is important to notice that the valence of the verb (in this sense) is not

the same as the number of arguments it takes. For example, the verb donate

takes three semantic arguments, as illustrated in (8).However, donate has70 Analyzing Grammar: An Introduction

avalence of two because it takes only two term arguments, SUBJ and

OBJ. With this predicate, the recipient is always expressed as an oblique


(8) a Michael Jackson donated his sunglasses to the National Museum.

b donate < agent, theme, recipient >

|| |

subj obj obl

Some linguists use the term “semantic valence” to refer to the number of

semantic arguments which a predicate takes, and “syntactic valence” to

specify the number of terms which a verb requires. In this book we will use

the term “valence” primarily in the latter (syntactic) sense.

doens’t help.

We have already seen that some verbs can be used in more than

one way. In chapter 4, for example, we saw that the verb give occurs in

two different clause patterns, as illustrated in (10).We can now see that

these two uses of the verb involve the same semantic roles but a different

assignment of Grammatical Relations, i.e. different subcategorization. This

difference is represented in (11). The lexical entry for give must allow for

both of these configurations.3

(10) a John gave Mary his old radio.

b John gave his old radio to Mary.

(11) a give < agent, theme, recipient >

|| |

subj obj2 obj

b give < agent, theme, recipient >

|| |

subj obj obl

it seems to me that there is something wholly wrong with a theory that treats 10a-b much different. those two sentences mean the same thing, and their structure is similar, and only one word makes the differnece. this word seems to just have the function of allowing for another order of the operands of the verb.

A number of languages have grammatical processes which, in effect,

“change” an oblique argument into an object. The result is a change in

the valence of the verb. This can be illustrated by the sentences in (19).

In (19a), the beneficiary argument is expressed as an OBL, but in (19b)

the beneficiary is expressed as an OBJ. So (19b) contains one more term

than (19a), and the valence of the verb has increased from two to three;

but there is no change in the number of semantic arguments. Grammatical

operations which increase or decrease the valence of a verb are a topic of

great interest to syntacticians. We will discuss a few of these operations in

chapter 14.

(19) a John baked a cake for Mary.

b John baked Mary a cake.

IMO, these two have the exact same number of operands, both have 3. for word <for> allows for a different ordering, i.e., it is a syntax-modifier.

at least, that’s one reading. 19a seems to be a less clear case of my alternative theory. one reading of 19a is that Mary was tasked with baking a cake, but John baked it for her. another reading has the same meaning as 19b.

(20) a #The young sausage likes the white dog.

b #Mary sings a white cake.

c #A small dog gives Mary to the young tree.

(21) a *John likes.

b *Mary gives the young boy.

c *The girl yawns Mary.

The examples in (20) are grammatical but semantically ill-formed –

they don’tmake sense.4

the footnote is: One reason for saying that examples like (20) and (22) are grammatical, even though

they sound so odd, is that it would often be possible to invent a context (e.g. in a fairy

tale or a piece of science fiction) in which these sentences would be quite acceptable.

This is not possible for ungrammatical sentences like those in (21).

i can think about several contexts where 21b makes sense. think of a situation where everybody is required to give something/someone to someone. after it is mentioned that several other people give this and that, 21b follows. in that context it makes sense just fine. however, it is because the repicient is implicit, since it is unnecessary (economic principle) to mention the recipient in every single sentence or clause.

21c is interpretable with if one considers ”the girl” an utterance, that Mary utters while yawning.

21a is almost common on Facebook. ”John likes this”, shortened to ”John likes”.

not that i think the author is wrong, i’m just being creative. :)

The famous example in (23) was used by Chomsky (1957) to show how

a sentence can be grammatical without being meaningful. What makes this

sentence so interesting is that it contains so many collocational clashes:

something which is green cannot be colorless; ideas cannot be green,or

any other color, but we cannot call themcolorless either; ideas cannot sleep;

sleeping is not the kind of thing one can do furiously; etc.

(23) #Colorless green ideas sleep furiously.

it is writings such as this that result in so much confusion. clear the different <cannot>’s in the above are not about the same kind of impossibility. let’s consider them:

<something which is green cannot be colorless> this is logical impossibility. these two predicates are logically incompatible, that is, they imply the lack of each other, that is, ∀xGreen(x)→¬Colorless(x). but actually this predicate has an internal negation. we can make it more explicit like this: ∀xGreen(x)→Colorful(x), and ∀xColorful(x)↔¬Colorless(x).

< ideas cannot be green,or any other color, but we cannot call themcolorless either; ideas cannot sleep;

sleeping is not the kind of thing one can do furiously> this is semantic impossibility. it concerns the meaning of the sentence. there is no meaning, and hence nothing expressed that can be true or false. from that it follows that there is nothing that can be impossible, since impossibility implies falsity. hence, if there is something connected with that sentence that is impossible, it has to be something else.

This kind of annotated tree diagramallows us to see at oncewhat iswrong

with the ungrammatical examples in (21) above: (21b) is incomplete, as

demonstrated in (34a), while (21c) is incoherent, as demonstrated in (34b).

a better set of terms are perhaps <undersaturated> and <oversaturated>.

there is nothing inconsistent about the second that isn’t also inconsitent in the first, and hence using that term is misleading. <incomplete> does capture an essential feature, which is that something is missing. the other ex. has something else too much. one could go for <incomplete> and <overcomplete> but it sounds odd. hence my choice of different terms.

The pro-formone can be used to refer to the head nounwhen it is followed

by an adjunct PP, as in (6a),but not when it is followed by a complement

PP as in (6b).

(6) a The [student] with short hair is dating the one with long hair.

b ∗The [student] of Chemistry was older than the one of Physics.

6b seems fine to me.

There is no fixed limit on howmanymodifiers can appear in such a sequence.

But in order to represent an arbitrarily long string of alternating adjectives

and intensifiers, it is necessary to treat each such pair as a single unit.

The “star” notation used in (15) is one way of representing arbitrarily

long sequences of the same category. For any category X, the symbol “X∗”

stands for “a sequence of any number (zero or more) of Xs.” So the symbol

“AP∗” stands for “a sequence of zero or more APs.” It is easy to mod-

ify the rule in (12b) to account for examples like (14b); this analysis is

shown in (15b). Under the analysis in (12a),wewould need to write a more

complex rule something like (15a).3 Because simplicity tends to be favored

in grammatical systems, (12b) and (15b) provide a better analysis for this


(15) aNP → Det ((Adv) A)

∗ N (PP)

bNP → Det AP∗ N (PP)

for those that are wondering where this use of asterisk comes from, it is from here:

In English, a possessor phrase functions as a kind of determiner. We

can see this because possessor phrases do not normally occur together with

other determiners in the same NP:

(19) a the new motorcycle

b Mary’s new motorcycle

c ∗Mary’s the new motorcycle

d ∗the Mary’s new motorcycle

looks more like it is because they are using proper nouns in their example. if one used a common noun, then it works just fine:

19e: The dog’s new bone.

Another kind of evidence comes fromthe fact that predicate complement

NPs cannot appear in certain constructions where direct objects can. For

example, an objectNP can become the subject of a passive sentence (44b) or

of certain adjectives (like hard, easy, etc.) which require a verbal or clausal

complement (44c).However, predicate complement NPs never occur in

these positions, as illustrated in (45).

(44) a Mary tickled an elephant.

b An elephant was tickled (by Mary).

c An elephant is hard (for Mary) to tickle.

(45) a Mary became an actress.

b *An actress was become (by Mary).

c *An actress is hard (for Mary) to become.

45c is grammatical with the optional element in place: An actress is hard for Mary to become. Altho it is ofc archaic in syntax.

mi amamas. ‘I am happy.’

yu amamas. ‘You (sg) are happy.’

em i amamas. ‘He/she is happy.’

yumi amamas. ‘We (incl.) are happy.’

mipela i amamas. ‘We (excl.) are happy.’

yupela i amamas. ‘You (pl) are happy.’

ol i amamas. ‘They are happy.’

it is difficult not to like this system, except for the arbitrary requirement of ”i” some places and not others. its clearly english-inspired. inclusive ”we” is interesting ”youme” :D

This constituent is normally labeled S’or S (pronounced “S-bar”). It con-

tains two daughters: COMP (for “complementizer”) and S (the complement

clause itself). This structure is illustrated in the tree diagram in (15), which

represents a sentence containing a finite clausal complement.

how to make this fit perfectly with the other use of N-bar terminology. in the case of noun phrases, we have NP on top, then N’ (with DET and adj) and then N at the bottom. it seems that we need to introduce some analogue to NP with S. the only level left is the entire sentence. SP sounds like a contradiction in terms or oxymoron though, ”sentence phrase”.

February 8, 2013

The usefulness of sentences, a hiarchial view?

Filed under: Linguistics/language,Metaphilosophy — Tags: — Emil O. W. Kirkegaard @ 06:26

(from A natural history of negation)

i had been thinking about a similar idea. but these work fine as a beginning. a good hierarchy needs a lvl for approximate truth as well (like Newton’s laws), as well as actual truths. but also perhaps a dimension for the relevance of the information conveyed. a sentence can express a true proposition without that proposition being relevant the making of just about any real life decision. for instance, the true proposition expressed by “42489054329479823423 is larger than 37828234747” will in all likelihood never, ever be relevant for any decision. also one can cote that the relevance dimension only begins when there is actually some information conveyed, that is, it doesnt work before level 2 and beyond, as those below are meaningless pieces of language.

and things that are inconsistent can also be very useful, so its not clear how the falseness, approximate truth, and truth related to usefulness. but i think that they closer it is the truth, the more likely that it is useful. naive set theory is fine for working with many proofs, even if it is an inconsistent system.

January 31, 2013

Paper: Social Structure and Language Structure: the New Nomothetic Approach

Filed under: Linguistics/language — Emil O. W. Kirkegaard @ 23:56

Social Structure and Language Structure the New Nomothetic Approach

found via Gene Expression,

Recent studies have taken advantage of newly available, large-scale, cross-linguistic data and
new statistical techniques to look at the relationship between language structure and social
structure. These ‘nomothetic’ approaches contrast with more traditional approaches and a
tension is observed between proponents of each method. We review some nomothetic studies
and point out some challenges that must be overcome. However, we argue that nomothetic
approaches can contribute to our understanding of the links between social structure and
language structure if they address these challenges and are taken as part of a body of mutu-
ally supporting evidence. Nomothetic studies are a powerful tool for generating hypotheses
that can go on to be corroborated and tested with experimental and theoretical approaches.
These studies are highlighting the effect of interaction on language.
Key words: nomothetic, social structure, complex adaptive systems, linguistic niche hypoth-
esis, cultural evolution

Older Posts »

Powered by WordPress