- Modgil & C. Modgil (eds.). (1984). Arthur Jensen: consensus and controversy. Lewes, Sussex, Falmer Press
In a book that’s not widely-read but should be, Thomas Bouchard notes in his chapter (The Hereditarian Research Program: Triumphs and Tribulations):
A principal feature of the many critiques of hereditarian research is an excessive concern for purity, both in terms of meeting every last assumption of the models being tested and in terms of eliminating all possible errors. The various assumptions and potential errors that may, or may not, be of concern are enumerated and discussed at great length. The longer the discussion of potential biasing factors, the more likely the critic is to conclude that they are actual sources of bias. By the time a chapter summary or conclusion section is reached, the critic asserts that it is impossible to learn anything using the design under discussion. There is often, however, a considerable amount known about the possible effect of the violation of assumptions. As my colleague Paul Meehl has observed, ‘Why these constraints are regularly treated as “assumptions” instead of refutable conjectures is itself a deep and fascinating question…’ (Meehl, 1978, p. 810). In addition, potential systematic errors sometimes have testable consequences that can be estimated. They are, unfortunately, seldom evaluated. In other instances the data themselves are simply abused. As I have pointed out elsewhere:
The data are subgrouped using a variety of criteria that, although plausible on their face, yield the smallest genetic estimates that can be squeezed out. Statistical significance tests are liberally applied and those favorable to the investigator’s prior position are emphasized. Lack of statistical significance is overlooked when it is convenient to do so, and multiple measurements of the same construct (constructive replication within a study) are ignored. There is repeated use of significance tests on data chosen post hoc. The sample sizes are often very small, and the problem of sampling error is entirely ignored. (Bouchard, 1982a, p. 190)
This fallacious line of reasoning is so endemic that I have given it a name, ‘pseudo-analysis’ (Bouchard, 1982a, 1982b). Pseudo-analysis has been very widely utilized in the critiques and reanalyses of data gathered on monozygotic twins reared apart (cf. Heath, 1982; Fulker, 1975). I will look closely at this particular kinship, but warn the reader that the general conclusion applies equally to most other kinships.
Perhaps the most disagreeable criticism of all is the consistent claim that IQ tests are systematically flawed (each test in a different way) and, consequently, are poor measures of anything. These claims are seldom supported by reasonable evidence. If this class of argument were true, one certainly would not expect the various types of IQ tests (some remarkably different in content) to correlate as highly with each other as they do, nor, given the small samples used, would we expect them to produce such consistent results from study to study. Different critics launch this argument to different degrees, but they are of a common class. [Continued in the piece]
In modern language, we might say that critics engage in motivated p-fishing in the data (goal: minimize genetic effects), and then engage in selective reporting and interpretation of the results (report comparisons and their p values when they favor the goal, otherwise leave out).