So why trust science? I would say I am a proud supporter of scientism, crude versions aside. Naomi Oreskes is a prominent science historian who mostly writes history books attacking climate skepticism and Big Capitalism. But now she has a new brief book defending science. Actually, some of the main content is about attacking how science didn’t work as intended. She presents 5 case studies:

  1. The Limited Energy Theory
  2. The Rejection of Continental Drift
  3. Eugenics
  4. Hormonal Birth Control and Depression
  5. Dental Floss

These case studies discuss things going awry, but not necessarily moving from consensus to another consensus. Before going into those, she discusses the need for diversity in science::

But feminist philosophers of science, most notably Sandra Harding and Helen Longino, turned that argument on its head, suggesting that objectivity could be reenvisaged as a social accomplishment, something that is collectively achieved.89 Harding mobilized the concept of standpoint epistemology—the idea that how we view matters depends to a great extent on our social position (or, colloquially, that where we stand depends on where we sit)—to argue that greater diversity could make science stronger. Our personal experiences—of wealth or poverty, privilege or disadvantage, maleness or femaleness, heteronormativity or queerness, disability or able-bodiedness—cannot but influence our perspectives on and interpretations of the world. Therefore, ceteris paribus, a more diverse group will bring to bear more perspectives on an issue than a less diverse one.90

This perspective reinforces Harding’s position that objectivity is not a matter of either/or, but of degree. The greater the diversity and openness of a community and the stronger its protocols for supporting free and open debate, the greater the degree of objectivity it may be able to achieve as individual biases and background assumptions are “outed,” as it were, by the community. Put another way: objectivity is likely to be maximized when there are recognized and robust avenues for criticism, such as peer review, when the community is open, non-defensive, and responsive to criticism, and when the community is sufficiently diverse that a broad range of views can be developed, heard, and appropriately considered. On this view, it is not surprising that when scientists were almost exclusively white men, they developed theories about women and African Americans that were at best incomplete and at times pernicious—theories that have now been rejected. Nor is it surprising that many of the logical and empirical flaws of these earlier theories were pointed out by women and people of color.97 (This point is addressed further in chapter 2.)

How do we determine if a scientific community is sufficiently diverse, self-critical, and open to alternatives, particularly in the early stages of investigations when it is important not to close off avenues prematurely? How do we evaluate the quality of its institutional forms? We must examine each case on an individual basis. Many scientists were wrong about continental drift, but that does not mean that a different group of scientists are wrong today about climate change. They may be or they may not be. We cannot assert either position a priori.

If we can establish that there is a consensus among the community of qualified experts, then we may also want to ask

  • Do the individuals in the community bring to bear different perspectives? Do they represent a range of perspectives in terms of ideas, theoretical commitments, methodological preferences, and personal values?
  • Have different methods been applied and diverse lines of evidence considered?
  • Has there been ample opportunity for dissenting views to be heard, considered, and weighed?
  • Is the community open to new information and able to be self-critical?
  • Is the community demographically diverse: in terms of age, gender, race, ethnicity, sexuality, country of origin, and the like?

This latter point is needs further explication. Scientific training is intended to eliminate personal bias, but all the available evidence suggests that it does not and probably cannot. Diversity is a means to correct for the inevitability of personal bias. But what is the argument for demographic diversity? Isn’t the point really the need for perspectival diversity?

The best answer to this question is that demographic diversity is a proxy for perspectival diversity, or, better, a means to that end. A group of white, middle-aged, heterosexual men may have diverse views on many issues, but they may also have blind spots, for example, with respect to gender or sexuality. Adding women or queer individuals to the group can be a way of introducing perspectives that would otherwise be missed.

I agree entirely with the take, but the problem here is course her assumption that non-whites and women actually contribute diversity of opinion to science, and critically, the right kind of diversity. As a matter of fact, they seem not to. Opinion is less diverse among these people when it comes to many socio-political matters (they have a much stronger left-wing bias). If diversity of opinion was the goal, one would probably maximize this by recruiting via strict meritocracy, and the resulting pool of researchers would be mostly male and European ancestry. How’s that for feminist philosophy? Her later Header is In Diversity There Is Epistemic Strength. That’s right. She has this model:

Demographic diversity -> ideological and cultural diversity -> less blindspot bias -> better science -> more truth

However, she somehow neglects the actual mediator of ideological and cultural diversity that has been the discussion for the last decade or so. The guy who wrote the introduction, however, did not entirely miss it:

Like all excellent books, this one addresses many questions and also raises some. While Professor Oreskes argues that progress and reliability in science depends more on the qualities of scientific communities than on the character of individual scientists, she also argues that scientists’ inevitably have values and that they should be honest about them. Do not well-working scientific communities depend on the predominance of good values—of intellectual honesty and truth seeking—among scientists? And if diversity is important in scientific communities, of what kinds? The inclusion of women and members of racial, ethnic, religious, and other minority populations has obviously been very good for all of the sciences, and scholarship generally. Are there social sciences (and perhaps other fields of inquiry) in which greater ideological diversity would be helpful?

Clown world obviously.

Example 1: The Limited Energy Theory

You have maybe heard of this from feminist circles, but probably not:

In 1873, Edward H. Clarke (1820–77), an American physician and Harvard Medical School professor, argued against the higher education of women on the grounds that it would adversely affect their fertility.23 Specifically, he argued that the demands of higher education would cause their ovaries and uteri to shrink. In the words of Victorian scholars Elaine and English Showalter, “Higher education,” Clarke believed, “was destroying the reproductive functions of American women by overworking them at a critical time in their physiological development.”24

Clarke presented his conclusion as a hypothetic-deductive consequence of the theory of thermodynamics, specifically the first law: conservation of energy. Developed in the 1850s particularly by Rudolf Clausius, the first law of thermodynamics states that energy can be transformed or transferred but it cannot be created or destroyed. Therefore, the total amount of energy available in any closed system is constant. It stood to reason, Clarke argued, that activities that directed energy toward one organ or physiological system, such as the brain or nervous system, necessarily diverted it from another, such as the uterus or endocrine system. Clarke labeled his concept “The Limited Energy Theory.”25

Scientists were inspired to consider the implications of thermodynamics in diverse domains, and Clarke’s title might suggest he was applying energy conservation to a range of biological or medical questions.26 But not so. For Clarke, the problem of limited energy was specifically female, i.e., female capacity. In his 1873 book, Sex in Education; or, a Fair Chance for Girls, Clarke applied the first law to argue that the body contained a finite amount of energy and therefore “energy consumed by one organ would be necessarily taken away from another.”27 But his was not a general theory of biology, it was a specific theory of reproduction. Reproduction, he (and others) believed, was unique, an “extraordinary task” requiring a “rapid expenditure of force.”28 The key claim, then, was that energy spent on studies would damage women’s reproductive capacities. “A girl cannot spend more than four, or in occasional instances, five hours of force daily upon her studies” without risking damage, and once every four weeks she should have a complete rest from studies of any kind.29 One might suppose that, on this theory, too much time or effort spent on any activity, including perhaps housework or child-rearing, might similarly affect women’s fertility, but Dr. Clarke did not pursue that question. His concern was the potential effects of strenuous higher education.

So, fairly typical feminist example. The curious part about this one is that the prediction is somewhat true. Women who spend time in higher education are exactly those who have too few kids. The reason for this is that they lack energy time to pursue both. Essentially, one needs to substitute a single word in his theory to make it sort of right.

Example 3: Eugenics

The history of eugenics is far more complex than the two examples we have just examined, in part because it involved a wide range of participants, many of whom were not scientists (including US president Teddy Roosevelt), and the values and motivations that informed it were extremely diverse. Perhaps for this reason some historians have been reluctant to draw conclusions from what nearly all agree is a troubling chapter in the history of science. But it has been used explicitly by climate change deniers to claim that because scientists were once wrong about eugenics, they may be wrong now about climate change.57 For this reason, I think the subject cannot be ignored, and because of its complexity I grant it more space than the two examples we have just considered.

As is widely known, many scientists in the early twentieth century believed that genes controlled a wide range of phenotypic traits, including a long list of undesirable or questionable behaviors and afflictions, including prostitution, alcoholism, unemployment, mental illness, “feeble-mindedness,” shiftlessness, the tendency toward criminality, and even thalassophilia (love of the sea) as indicated by the tendency to join the US Navy or Merchant Marine. This viewpoint was the basis for the social movement eugenics: a variety of social practices intended to improve the quality of the American (or English, German, Scandinavian, or New Zealand) people, practices that in hindsight most of us view with dismay, outrage, even horror. These practices were discussed either under the affirmative rubrics of “race betterment” and “improvement,” or the negative rubrics of preventing “racial degeneration” and “race suicide.”58 The ultimate expression of these views in Nazi Germany is well known. Less well known is that in the United States, eugenic practices included the forced sterilization of tens of thousands of US citizens (and principally targeting the disabled), a practice upheld in the Buck v. Bell decision, wherein Supreme Court justice Oliver Wendell Holmes, Jr., upheld the rights of states to “protect” themselves from “vicious protoplasm.”59

The presentation of this topic is not as dumb and one-sided as many other sources. One curious element is where she highlights having socialists in science being a good thing because they brought some diversity of views on the eugenics matter. This is relevant because negative eugenics at the time essentially came down to advocating ways to reduce fertility of lower class people, as these were (correctly we know now) seen as having worse genetics. This however conflicts with the working class as saviors thinking in socialism/communism:

Mead’s discussion of Italian immigrants is an important reminder that, while the language of eugenics was that of “racial degeneration,” eugenics in America was concerned both with issues of race (as we understand the term today) and with gradations of European ethnicity, both of which were tied to class.85 The threat was understood to be to the “Nordic race”—the peoples of northern European descent—from both European and non-European sources, and so a major focus of eugenic study and target of eugenic practice was poor whites. In the United States, that largely meant immigrants, but in the United Kingdom it meant the working class. For this reason, it is perhaps not surprising that another group of scientists who objected to eugenics were socialists, including the British geneticists J.B.S. Haldane, J. D. Bernal, and Julian Huxley, and the American socialist Herman Muller.86

Professor of genetics and biometry at University College London, J.B.S. Haldane was the son of the famed Oxford physiologist John Scott Haldane, a socialist who pioneered the study of occupational hazards and originated the practice of bringing canaries into coalmines to monitor air quality.87 Initially Haldane sympathized with aspects of eugenics—in college he joined the Oxford Eugenics Society—but he was soon offended by its evident sociopolitical prejudices, particularly its class bias.

I don’t really disagree. Eugenics was at this time an odd mix of philosophy, science, and politics. The various socialists were right to object to some of the very crude but, ultimately largely correct, conclusions that were being drawn. Back then, Mendelism was the only game in town, so people proposed mendelian inheritance patterns for all kinds of clearly polygenic traits, and this was pretty terrible science. Only after Fisher united the biometric school did things start making more sense.

Example 4: Hormonal Birth Control and Depression

Now we get into Naomi’s own typical women irrationality biases. She reveals them in the first chapter too:

We have a considerable literature on indigenous expertise: the knowledge that both lay people and experts may have about plants, animals, geography, climate, or other aspects of their natural environments and communities. In recent decades we have come to understand more fully the empirical knowledge systems that have developed outside of what we conventionally call “Western science”—what anthropologist Susantha Goonatilake has called “civilizational knowledge.” These systems may involve highly developed expertise, and may be quite effective in their realms.120 For example, Traditional Chinese Medicine (TCM), acupuncture, and Ayurvedic medicine can be efficacious in treating certain diseases and conditions for which Western medicine has little to offer.121 Civilizational knowledge traditions have authority in their regions of origin by virtue of track records of success, and in some cases (e.g., acupuncture) have demonstrated efficacy beyond those regions as well. Moreover, the study of civilizational knowledge has highlighted the values embedded in Western science that often go unrecognized or are even denied by its practitioners.122

Since Chinese medicine is actually pseudoscience, and doesn’t work for anything except by luck, bringing in more Chinese voices into science probably only makes it worse. Of course, that is ignoring the other fact that science depends in large parts on trust, and the Chinese are notorious for their dishonesty (see also the wallet study which finds China to be the most dishonest country of the 40 countries studies, below even African countries like Ghana). That aside, what does she say about hormonal pills?

Recently, there was a flurry of media attention about a new study demonstrating that the Pill can cause depression.110 Physicians lauded the study, and the media presented the result as a novel finding.111 My own daughter, however, asked me on the day the coverage hit the media: How is this news? She knew that the Pill could cause depression, because I had told her so.

I have no history of depression—no family history of depression or mental illness of any sort—but when I was in my midtwenties, I experienced a sudden and peculiar bout of extreme melancholy. I lost my energy for daily tasks, lost interest in my work, and, after about six weeks, found myself having trouble getting out of bed. And yet, in other respects my life was going well. I was in my second year of graduate school, had done very well in my first year, was working on an exciting project for which I had adequate funding, and had met a very nice man who would soon become my husband (and to whom I’ve now been married for more than thirty years).

I went to counseling at a campus health center, and I was lucky. The female counselor asked me straight away: Are you on the Pill? The answer was yes. I explained that I had recently returned from Australia, and because Australia at that time had free health insurance, including prescription drugs, I had bought a year’s worth before I left. But the particular formulation that I had been prescribed in Australia was not available in the United States, so when the year was up I had to switch to another type. That had occurred two months before. The onset of my depression began shortly after I had started this new form of the Pill. The therapist told me that the type of pill I was now on—a combination formulation—was well known to be more likely to cause depression than some other options. I stopped the drug immediately and my recovery began nearly as immediately. Within a few weeks I was back to my normal self, I thanked the therapist, and went on to a successful academic career and life.

My experience can be dismissed as “just an anecdote,” but I prefer to view it as a clinical study in which n = 1. The more important point is that many women have had such experiences and reported them to their physicians and therapists. The website Healthline.com, which claims to be the “fastest growing consumer health information site,” notes that “depression is the most common reason women stop using birth control pills.”112 Moreover, like me, many women have bounced back to normal when they stopped taking the Pill or switched to other formulations. And these case reports have spurred numerous scientific studies. As one physician recently wrote, “decades of reports of mood changes associated with these hormone medications have spurred multiple research studies.” So my daughter was correct to ask: how was this new study news?

It is hard to argue with a study of over one million women. It is also hard to argue with any study done in Denmark, which has a national health care database covering every Danish citizen and thus allows researchers to correct for sampling biases and other confounding effects. It is thanks to Denmark that we can say with confidence that children who are fully vaccinated according to prevailing public health recommendations do not suffer autism at greater rates than those who are not.114 So, three cheers for Denmark. Three cheers, as well, for this big, new convincing study. But note the explanation of why it took so long to come to this point: the lack of “hard data like diagnosis codes and prescription records.” Previous studies, we are told, relied on “iffy methods like self-reporting, recall, and insufficient numbers of subjects.”115

The term “hard data” should be a red flag, because the history and sociology of science show that there are no hard data. Facts are “hardened” through persuasion and their use. Moreover, remarks of this type raise the question of why some forms of data are considered hard and others are not. Just look at what is being considered hard data here: diagnosis codes and prescription records. Many people would say hard data are quantitative data, but neither of these constitutes a measurement: they are the subjective judgments of practitioners and the drugs they choose to prescribe in response to those judgments.116 Moreover, there is a substantial literature on misdiagnosis in medicine, and on the distorting effects of pharmaceutical industry advertising and marketing on prescribing practices.117 Given what we know about medical practice and its history, the idea that diagnosis codes and prescription records should be taken as hard facts seems almost satirical.

But it gets worse: the study authors accepted the reports of doctors—their diagnosis codes and prescription records—as facts, whereas the reports of female patients were dismissed unreliable—in Tello’s words: “iffy.” Bias—either against women or against patients—is clearly on display. But here is the key point: the conclusion of the Denmark study is the same as all those iffy, self-reports from female patients. If the new study is correct, then the allegedly iffy self-reports were correct all along.

We have known for fifty years that the Pill can cause mood disorders in women. We know that drugs that treat mood disorders can affect hormones involved with libido, and scientists know at least one mechanism by which this occurs. And a recent study was stopped because hormonal contraceptive caused mood disorders in male subjects. A reasonable person might therefore ask: what was left to be established? Or as my daughter put it, why was the finding that the Pill causes depression in women viewed as news?

Let us return to the Denmark study. It did not find that previous studies of oral contraceptives had shown that hormonal contraception did not cause mood changes. Rather, it concluded that “inconsistent research methods and lack of uniform assessments [made] it difficult to make strong conclusions about which … users are at risk for adverse mood effects.”134 In other words, it suggested that until now, we didn’t know enough to draw a firm conclusion.

These researchers took the conventional approach of assuming no effect and requiring statistical proof at a specific significance level to say that an effect had been detected—and was therefore known. So did the various studies that preceded them. There’s nothing particularly shocking about this; it is common statistical practice. But it says, in effect, that if evidence is not available that meets that standard, we must conclude that our results are inconclusive—or in lay terms, that we just don’t know.

Her main point is that doctors’ should have taken women’s complaints more seriously. We know of course why they don’t: taking such things seriously one can justify absolutely any treatment, and that’s why we do clinical trials in the first place. This stuff is straight out of Bad Science. Doctors were of course right to question this received wisdom. I looked up the study she mentions as being particularly convincing:

Importance  Millions of women worldwide use hormonal contraception. Despite the clinical evidence of an influence of hormonal contraception on some women’s mood, associations between the use of hormonal contraception and mood disturbances remain inadequately addressed.

Objective  To investigate whether the use of hormonal contraception is positively associated with subsequent use of antidepressants and a diagnosis of depression at a psychiatric hospital.

Design, Setting, and Participants  This nationwide prospective cohort study combined data from the National Prescription Register and the Psychiatric Central Research Register in Denmark. All women and adolescents aged 15 to 34 years who were living in Denmark were followed up from January 1, 2000, to December 2013, if they had no prior depression diagnosis, redeemed prescription for antidepressants, other major psychiatric diagnosis, cancer, venous thrombosis, or infertility treatment. Data were collected from January 1, 1995, to December 31, 2013, and analyzed from January 1, 2015, through April 1, 2016.

Exposures  Use of different types of hormonal contraception.

Main Outcomes and Measures  With time-varying covariates, adjusted incidence rate ratios (RRs) were calculated for first use of an antidepressant and first diagnosis of depression at a psychiatric hospital.

Results  A total of 1 061 997 women (mean [SD] age, 24.4 [0.001] years; mean [SD] follow-up, 6.4 [0.004] years) were included in the analysis. Compared with nonusers, users of combined oral contraceptives had an RR of first use of an antidepressant of 1.23 (95% CI, 1.22-1.25). Users of progestogen-only pills had an RR for first use of an antidepressant of 1.34 (95% CI, 1.27-1.40); users of a patch (norgestrolmin), 2.0 (95% CI, 1.76-2.18); users of a vaginal ring (etonogestrel), 1.6 (95% CI, 1.55-1.69); and users of a levonorgestrel intrauterine system, 1.4 (95% CI, 1.31-1.42). For depression diagnoses, similar or slightly lower estimates were found. The relative risks generally decreased with increasing age. Adolescents (age range, 15-19 years) using combined oral contraceptives had an RR of a first use of an antidepressant of 1.8 (95% CI, 1.75-1.84) and those using progestin-only pills, 2.2 (95% CI, 1.99-2.52). Six months after starting use of hormonal contraceptives, the RR of antidepressant use peaked at 1.4 (95% CI, 1.34-1.46). When the reference group was changed to those who never used hormonal contraception, the RR estimates for users of combined oral contraceptives increased to 1.7 (95% CI, 1.66-1.71).

Conclusions and Relevance  Use of hormonal contraception, especially among adolescents, was associated with subsequent use of antidepressants and a first diagnosis of depression, suggesting depression as a potential adverse effect of hormonal contraceptive use.

So, yes, it does have 1 million women. But this is just a cross-sectional study. Users of these pills have more issues but mainly in younger women (something to do with growth?). The main problem here is that we don’t know whether women who start taking the pill are those women who are more likely to have depressions to begin with. We might think so, since taking these are probably related to political leftism and that’s itself related to depression. That’s just speculation. Here it would be nice if I had a slam dunk study, something like a big clinical trial, or a sibling-comparison, or a twin-control (better yet), or within-individual study (awesome), but alas, my searching produced none of these. I did however find a large American study using the Add Health dataset which found… the opposite result. The age interaction was confirmed in a Swedish study of 700k women. In fact, it finds no association in older women at all!

I conclude that there is a lot we don’t know about this relationship, but if one were to be cautious, it may be a good idea to avoid pills in very early teenage girls, but for typical use case of 18+, the additional risk seems quite small, even if it was causal.

Example 5: Dental Floss

This chapter is even odder. Here she is relying mainly on media reports of dental floss not working, and contrasting this with a Cochrane review:

My final case involves a very grave public health issue: dental floss.

Many people have recently heard that flossing your teeth doesn’t do you any good. In August 2016 there was a flurry of coverage saying so. The New York Times asked, “Feeling Guilty about Not Flossing? Maybe There Is No Need.”138 The Los Angeles Times reassured its readers that if they didn’t floss, they needn’t feel bad because it probably doesn’t work anyway.139 So did Mother Jones, which ran the headline, “Guilty No More: Flossing Doesn’t Work.”140 Newsweek asked, “Has the Flossing Myth Been Shattered?”141

These various reports were based on an article by the Associated Press (AP) that claimed that there is “little proof that flossing works.” The AP quoted National Institutes of Health dentist Tim Iafolla, acknowledging “that if the highest standards of science were applied in keeping with the flossing reviews of the past decade, ‘then it would be appropriate to drop the floss guidelines.’ ”142 The Chicago Tribune linked this latest reversal in scientific fortune to previous (alleged) reversals on salt and fat.143 Evidently, we can add dental floss to the list of issues on which scientists have “got it wrong.”

The most well-known and respected source of information on the state of the art in biomedicine is the Cochrane group, a nonprofit collaboration that bills itself as “representing an international gold standard for high quality, trusted information.” The collaboration claims thirty-seven thousand participants from more than 130 countries who “work together to produce credible, accessible health information that is free from commercial sponsorship and other conflicts of interest.”158 As the New York Times correctly reported, in 2011, the collaboration issued a report from its oral health group reviewing existing clinical trials examining the benefits of regular use of dental floss.159

The report was based on a review of twelve trials, with 582 subjects in flossing-plus-toothbrushing groups and 501 participants in toothbrushing-alone groups. The report summary reads as follows:

There is some evidence from twelve studies that flossing in addition to tooth-brushing reduces gingivitis compared to tooth-brushing alone. There is weak, very unreliable evidence from 10 studies that flossing plus tooth-brushing may be associated with a small reduction in plaque at 1 and 3 months. No studies reported the effectiveness of flossing plus tooth-brushing for preventing dental caries [tooth decay].160

That part of the summary was reported, wholly or in part, in many of the media reports. But the report also said:

Flossing plus tooth-brushing showed a statistically significant benefit compared to tooth-brushing in reducing gingivitis at the three time points studied [although the effect size was small].161 The 1-month estimate translates to a 0.13 point reduction on a 0 to 3 point scale for … gingivitis … and the 3 and 6 month results translate to 0.20 and 0.09 reductions on the same scale.162

So, what to do? Basically, the Cochrane review finds that we don’t know whether dental flossing works. Strangely, the review has now been withdrawn! But not why. I will take a guess that it has to do with the attacks on the authors as an effect of media attention.

So the evidence base is now even worse. She continues:

But it is so surprising? Perhaps not. What we learned in 2016 was that we didn’t have the long-term, randomized clinical trials that would be necessary to prove the benefits of dental floss according to prevailing medical standards. It’s not that hard to understand why, in a world of cancer, heart disease, opioid abuse, and the continued use of tobacco products, such studies have not been done. It’s not egregious that researchers have focused their attention on matters that appear to be more serious. What is egregious is that in the absence of evidence that meets the “gold” standard of the randomized clinical trial, people have concluded that there is no evidence at all. That is both false and illogical.166

Moreover, the gold standard of clinical trials is not just the randomized trial, but the double-blind randomized trial, and it is impossible to do a double-blind trial of dental floss. (This difficulty also plagues studies of nutrition, exercise, yoga, meditation, acupuncture, surgery, and any number of interventions of which the subject is necessarily aware.) Any study of floss usage will also require self-reporting, which, as we have seen, is disparaged. Moreover, if you believe that long-term flossing can prevent tooth loss in old age, it would be unethical to ask a control group to refrain from flossing for what would have to be the better part of their lives. The sort of study that would be required to convince those who subscribe to the “gold standard” is both impossible and arguably unethical to perform.167

She is right that we don’t really have good evidence flossing doesn’t work. But why bother with an uncomfortable and dubious procedure? Sounds like bloodletting.

What would be the right tool to investigate dental floss? One might be a different sort of clinical trial. The American Dental Association notes that disappointing results might be the result of poor flossing, which, they noted, is a “technique sensitive intervention.”174 The New York Times concluded: “So maybe perfect flossing is effective. But scientists would be hard put to find anyone to test that theory.”175 With due respect, that is an ill-informed remark, because scientists have tested that theory. The clinical trials reviewed by the Cochranes did not examine the impact of flossing technique, but a review of six trials in which professionals flossed the teeth of children on school days for almost two years, saw a 40% reduction in the risk of cavities.176

That is a huge effect. So consider this alternative headline: “A New Job Opportunity: Science Shows the Need for Professional Flossers.” Imagine the social change that might have ensued and the employment opportunities created. On our way to work, instead of stopping at Peets or Starbucks for a quick latte or a Drybar for a blow-out, we could stop at a flossing bar for a five-minute professional floss.

So, she finally comes down on pro-flossing based on flimsy evidence. The one cited here, 176, is ibid., and refers to 175, which is actually…. a New York Times article. They don’t link to the article (this is 2016!) but I found it:

Our aim was to assess, systematically, the effect of flossing on interproximal caries risk. Six trials involving 808 subjects, ages 4 to 13 years, were identified. There were significant study-to-study differences and a moderate to large potential for bias. Professional flossing performed on school days for 1.7 years on predominantly primary teeth in children was associated with a 40% caries risk reduction (relative risk, 0.60; 95% confidence interval, 0.48-0.76; p-value, < 0.001). Both three-monthly professional flossing for 3 years (relative risk, 0.93; 95% confidence interval, 0.73-1.19; p-value, 0.32) and self-performed flossing in young adolescents for 2 years (relative risk, 1.01; 95% confidence interval, 0.85-1.20; p-value, 0.93) did not reduce caries risk. No flossing trials in adults or under unsupervised conditions could be identified. Professional flossing in children with low fluoride exposures is highly effective in reducing interproximal caries risk. These findings should be extrapolated to more typical floss-users with care, since self-flossing has failed to show an effect.

So it gets more terribler. The confidence interval goes up to .76, so maybe it only reduces carries by 24%, not 40%. But worse, self and intermittent flossing did not have any good evidence for it! The various confidence intervals are again quite large, maybe self-flossing works. We don’t really know. There were apparently no studies of adults, and of course, the big discussion in NYT etc. is about whether adults should self-floss. The available evidence is not in favor of this. Feynman spoke with wisdom. For good measure, I checked if there was a newer Cochrane review, and there was, 2019:

Primary objective: comparisons against toothbrushing alone

Low‐certainty evidence suggested that flossing, in addition to toothbrushing, may reduce gingivitis (measured by gingival index (GI)) at one month (SMD ‐0.58, 95% confidence interval (CI) ‐1.12 to ‐0.04; 8 trials, 585 participants), three months or six months. The results for proportion of bleeding sites and plaque were inconsistent (very low‐certainty evidence).

Very low‐certainty evidence suggested that using an interdental brush, plus toothbrushing, may reduce gingivitis (measured by GI) at one month (MD ‐0.53, 95% CI ‐0.83 to ‐0.23; 1 trial, 62 participants), though there was no clear difference in bleeding sites (MD ‐0.05, 95% CI ‐0.13 to 0.03; 1 trial, 31 participants). Low‐certainty evidence suggested interdental brushes may reduce plaque more than toothbrushing alone (SMD ‐1.07, 95% CI ‐1.51 to ‐0.63; 2 trials, 93 participants).

Very low‐certainty evidence suggested that using wooden cleaning sticks, plus toothbrushing, may reduce bleeding sites at three months (MD ‐0.25, 95% CI ‐0.37 to ‐0.13; 1 trial, 24 participants), but not plaque (MD ‐0.03, 95% CI ‐0.13 to 0.07).

Very low‐certainty evidence suggested that using rubber/elastomeric interdental cleaning sticks, plus toothbrushing, may reduce plaque at one month (MD ‐0.22, 95% CI ‐0.41 to ‐0.03), but this was not found for gingivitis (GI MD ‐0.01, 95% CI ‐0.19 to 0.21; 1 trial, 12 participants; bleeding MD 0.07, 95% CI ‐0.15 to 0.01; 1 trial, 30 participants).

Very‐low certainty evidence suggested oral irrigators may reduce gingivitis measured by GI at one month (SMD ‐0.48, 95% CI ‐0.89 to ‐0.06; 4 trials, 380 participants), but not at three or six months. Low‐certainty evidence suggested that oral irrigators did not reduce bleeding sites at one month (MD ‐0.00, 95% CI ‐0.07 to 0.06; 2 trials, 126 participants) or three months, or plaque at one month (SMD ‐0.16, 95% CI ‐0.41 to 0.10; 3 trials, 235 participants), three months or six months, more than toothbrushing alone.

Nothing changed.

Her main thing to point to is scientific consensus. It’s mentioned 130 times in this book! Strangely, there is no actual definition of this:

Consensus is essential to our argument for the simple reason that we have no way to know for sure if any particular scientific claim is true. As philosophers going back to Plato (and perhaps before) have long recognized, we do not have independent, unmediated access to reality and therefore have no independent, unmediated means to judge the truth content of scientific claims. We can never be entirely positive. Expert consensus serves as a proxy. We cannot know if scientists have settled on the truth, but we can know if they have settled. In some cases where it is alleged in hindsight that scientists “got it wrong,” we find on closer examination that there was, in fact, no consensus among scientists on the matter at hand. Eugenics is a case in point.

Diversity is crucial because, ceteris paribus, it increases the odds that any particular claim has been examined from many angles and potential shortcomings revealed. Homogenous groups often fail to recognize their shared biases. In chapter 2, we saw not only how the Limited Energy Theory instantiated prevailing late nineteenth-century American gender bias, but also how Dr. Mary Putnam Jacobi shone a light on those biases and in doing so revealed serious flaws in both the theory and its evidentiary basis. We also saw how socialist geneticists were particularly articulate in their opposition to eugenics, drawing on their politics to question the obvious class bias in many eugenic theories and proposals. One did not have to be a socialist to question eugenics, but socialist class consciousness played a role in a substantial line of dissent.

Methodological openness and flexibility are necessary because when scientists become rigid about method, they may miss, discount, or reject theories and data that do not meet their standards. We saw this at play in the history of continental drift theory, as American scientists rejected a theory that did not follow their preferred inductive approach; in the history of the contraceptive pill, where gynecologists rejected case reports from patients because they were viewed as subjective and therefore unreliable; and in the evaluation of dental floss, where double-blind trials were simply not possible.

She only thing she is missing here is the measurement of consensus. She has so far relied on published works. This works but only if there is no bias in peer review. Strangely, she never discusses this topic in depth. Relying on published works will also be problematic if there is social pressure to avoid publishing an opinion on some topic. We of course know there are such pressures:

Or this survey:

On these topics:

Who are the experts?

So scientific expertise is about majority expert opinion, but who are the experts?

How do we judge if non-experts have relevant, useful, and accurate information? This is not an easy question to answer. We have clear markers of scientific training and expertise: higher education, membership in scientific and learned societies, records of publication and research grants, H-indices, awards and prizes, and the like. Scientists know who their scientific colleagues are and what their track records look like. Scientists (for the most part) know which journals have rigorous peer review and which do not.

Judging information from outside the expert world, however, is a different and trickier matter.

Scholars have identified several categories worthy of attention. One is other professionals who have relevant information. This could include nurses and midwives, for example, who have direct contact with patients and may differ from physicians on questions such as pain management.179 A second category is people who may not have professional training, but whose daily experiences may lead them to relevant knowledge and understandings, such as farmers and fishermen.180 We might say that these people have daily “on the ground” experience, and therefore may see things that scientific experts, for whatever reason, have missed. (Earth scientists call this “ground truth,” in this case referring to what geologists on the ground see and therefore know about, as compared with evidence, for example, from satellite remote sensing.) As Brian Wynne has stressed, the non-expert world is not “epistemically vacuous.”181

A third category is what Marjorie Garber has called “amateur professionals.”182 These are people—perhaps independent scholars or scholars from other fields—who have educated themselves on a particular subject. Developing expertise outside of conventional avenues of credentialism is certainly possible (although if a scholar from one field moves into another, they can establish credentials by publishing). A fourth category is citizen scientists: people who earn their living in other ways, but participate in science out of love or interest. In some domains—astronomy, entomology, ornithology, and the search for extraterrestrial life—citizen scientists have played significant roles in observing things that professionals do not have the time, money, or human resources to track.

The trouble with using scientific publications in “journals [which] have rigorous peer review”, is that these journals themselves reflect the prevailing bias. Thus, defining expert this way leads to circularity: the experts are those who are able to publish in outlets controlled by those with the dominant biases. That is not how it is supposed to work, but that is, to some extent, how it does work. Here of course, I will take the opportunity to repeat the finding about scientific credibility of journals and their prestige. There is no relationship as far as we can objectively measure:

In which journal a scientist publishes is considered one of the most crucial factors determining their career. The underlying common assumption is that only the best scientists manage to publish in a highly selective tier of the most prestigious journals. However, data from several lines of evidence suggest that the methodological quality of scientific experiments does not increase with increasing rank of the journal. On the contrary, an accumulating body of evidence suggests the inverse: methodological quality and, consequently, reliability of published research works in several fields may be decreasing with increasing journal rank. The data supporting these conclusions circumvent confounding factors such as increased readership and scrutiny for these journals, focusing instead on quantifiable indicators of methodological soundness in the published literature, relying on, in part, semi-automated data extraction from often thousands of publications at a time. With the accumulating evidence over the last decade grew the realization that the very existence of scholarly journals, due to their inherent hierarchy, constitutes one of the major threats to publicly funded science: hiring, promoting and funding scientists who publish unreliable science eventually erodes public trust in science.

So, which journals to include in the set among those that experts publish in? It’s not an easy question, and I shall not give a try here. Probably we should just train AI to recognize scientific papers, and then use that as a measure of the scientificness of a given journal. Just don’t publish the workings of the bot, lest someone finds a way to game it…