# Clear Language, Clear Mind

## February 25, 2015

### Two very annoying statistical fallacies with p-values

Filed under: Math/Statistics — Tags: , , — Emil O. W. Kirkegaard @ 14:48

Some time ago, I wrote on Reddit:

There are two errors that I see quite frequently:

1. Conclude from the fact that a statistically significant difference was found to that a significant socially, scientifically or otherwise difference was found. The reason this won’t work is that any minute difference will be stat.sig. if N is large enough. Some datasets have N=1e6, so very small differences between groups can be found reliably. This does not mean they are worth any attention. The general problem is the lack of focus on effect sizes.
2. Conclude from the fact that a difference was not statistically significant to that there was no difference in that trait. The error being that they ignore the possibility of false negative; there is a difference, but sample size is too small to reliably detect it or sampling fluctuation caused it to be smaller than usual in the present sample. Together with the misuse of P values, one often sees stuff like “men and women differed in trait1 (p<0.04) but did not differ in trait2 (p>0.05), as if the p value difference of .01 has some magical significance.

These are rather obvious (to me), so I don’t know why I keep reading papers (Wassell et al, 2015) that go like this:

### 2.1. Experiment 1

In experiment 1 participants filled in the VVIQ2 and reported their current menstrual phase by counting backward the appropriate number of days from the next onset of menstruation. We grouped female participants according to these reports. Fig. 2A shows the mean VVIQ2 score for males and females in the follicular and mid luteal phases (males: M = 56.60, SD = 10.39, follicular women: M = 60.11, SD = 8.84, mid luteal women: M = 69.38, SD = 8.52). VVIQ2 scores varied between menstrual groups, as confirmed by a significant one-way ANOVA, F(2, 52) = 8.63, p < .001, η2 = .25. Tukey post hoc comparisons revealed that mid luteal females reported more vivid imagery than males, p < .001, d = 1.34, and follicular females, p < .05, d = 1.07, while males and follicular females did not differ, p = .48, d = 0.37. These data suggest a possible link between sex hormone concentration and the vividness of mental imagery.

A normal interpretation of the above has the authors as making the fallacy. It is even contradictory, an effect size of d=.37 is a medium-small effect, but in the same sentence they state that there is no effect (i.e. d=0).

However, later on they write:

VVIQ2 scores were found to significantly correlate with imagery strength from the binocular rivalry task, r = .37, p < .01. As is evident in Fig. 3A, imagery strength measured by the binocular rivalry task varied significantly between menstrual groups, F(2, 55) = 8.58, p < .001, η2 = .24, with mid luteal females showing stronger imagery than both males, p < .05, d = 1.03, and late follicular females, p < .001, d = 1.26. These latter two groups’ scores did not differ significantly, p = .51, d = 0.34. Together, these findings support the questionnaire data, and the proposal that imagery differences are influenced by menstrual phase and sex hormone concentration.

Now the authors are back to phrasing it in a way that cannot be taken as the fallacy. Sometimes it gets more silly. One paper, Kleisner et al (2014) which received quite a lot of attention in the media, is based on this kind of subgroup analysis where the effect had p<.05 for one gender but not the other. The typical source of this silliness is the relatively small sample size of most studies combined with the authors’ use of exploratory subgroup analysis (which they pretend to be hypothesis-driven in their testing). Gender, age, and race are the typical groups explored and in combination.

Probably, it best is scientists would stop using “significant” to talk about lowish p values. There is a very large probability that the public will misunderstand this. (There was agood study recently about this, but I can’t find it again, help!)

References

Kleisner, K., Chvátalová, V., & Flegr, J. (2014). Perceived intelligence is associated with measured intelligence in men but not women. PloS one, 9(3), e81237.

Wassell, J., Rogers, S. L., Felmingam, K. L., Bryant, R. A., & Pearson, J. (2015). Sex hormones predict the sensory strength and vividness of mental imagery. Biological Psychology.

## February 12, 2015

### Simpler way to correct for restriction of range?

Filed under: Math/Statistics — Tags: , — Emil O. W. Kirkegaard @ 22:34

Restriction of range is when the variance in some variable is reduced compared to the true population variance. This lowers the correlation between this variable and other variables. It is a common problem with research on students which are selected for general intelligence (GI) and hence have a lower variance. This means that correlations between GI and whatever found in student samples is too low.

There are some complicated ways to correct for restriction of range. The usual formula used is this:

which is also known as Thorndike’s case 2, or Pearson’s 1903 formula. Capital XY are the unrestricted variables, xy the restricted. The hat on r means estimated.

However, in a paper in review I used the much simpler formula, namely: corrected r = uncorrected r / (SD_restricted/SD_unrestricted) which seemed to give about the right results. But I wasn’t sure this was legit, so I did some simulations.

First, I selected a large range of true population correlations (.1 to .8) and a large range of selectivity (.1 to .9), then I generated very large datasets with each population correlation. Then for each restriction, I cut off the datapoints where the one variable was below the cutoff point, and calculated the correlation in that restricted dataset. Then I calculated the corrected correlation. Then I saved both pieces of information.

This gives us these correlations in the restricted samples (N=1,000,000)

 cor/restriction R 0.1 R 0.2 R 0.3 R 0.4 R 0.5 R 0.6 R 0.7 R 0.8 R 0.9 r 0.1 0.09 0.08 0.07 0.07 0.06 0.06 0.05 0.05 0.04 r 0.2 0.17 0.15 0.14 0.13 0.12 0.11 0.10 0.09 0.08 r 0.3 0.26 0.23 0.22 0.20 0.19 0.17 0.16 0.14 0.12 r 0.4 0.35 0.32 0.29 0.27 0.26 0.24 0.22 0.20 0.17 r 0.5 0.44 0.40 0.37 0.35 0.33 0.31 0.28 0.26 0.23 r 0.6 0.53 0.50 0.47 0.44 0.41 0.38 0.36 0.33 0.29 r 0.7 0.64 0.60 0.57 0.54 0.51 0.48 0.45 0.42 0.37 r 0.8 0.75 0.71 0.68 0.65 0.63 0.60 0.56 0.53 0.48

The true population correlation is in the left-margin. The amount of restriction in the columns. So we see the effect of restricting the range.

Now, here’s the corrected correlations by my method:

 cor/restriction R 0.1 R 0.2 R 0.3 R 0.4 R 0.5 R 0.6 R 0.7 R 0.8 R 0.9 r 0.1 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.09 r 0.2 0.20 0.20 0.20 0.20 0.21 0.21 0.20 0.20 0.20 r 0.3 0.30 0.31 0.31 0.31 0.31 0.31 0.30 0.30 0.29 r 0.4 0.41 0.41 0.42 0.42 0.42 0.42 0.42 0.42 0.42 r 0.5 0.52 0.53 0.53 0.54 0.54 0.55 0.55 0.56 0.56 r 0.6 0.63 0.65 0.66 0.67 0.68 0.69 0.70 0.70 0.72 r 0.7 0.76 0.79 0.81 0.83 0.84 0.86 0.87 0.89 0.90 r 0.8 0.89 0.93 0.97 1.01 1.04 1.07 1.10 1.13 1.17

Now, the first 3 rows are fairly close deviating by max .1, but it the rest deviates progressively more. The discrepancies are these:

 cor/restriction R 0.1 R 0.2 R 0.3 R 0.4 R 0.5 R 0.6 R 0.7 R 0.8 R 0.9 r 0.1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.01 r 0.2 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.00 0.00 r 0.3 0.00 0.01 0.01 0.01 0.01 0.01 0.00 0.00 -0.01 r 0.4 0.01 0.01 0.02 0.02 0.02 0.02 0.02 0.02 0.02 r 0.5 0.02 0.03 0.03 0.04 0.04 0.05 0.05 0.06 0.06 r 0.6 0.03 0.05 0.06 0.07 0.08 0.09 0.10 0.10 0.12 r 0.7 0.06 0.09 0.11 0.13 0.14 0.16 0.17 0.19 0.20 r 0.8 0.09 0.13 0.17 0.21 0.24 0.27 0.30 0.33 0.37

So, if we can figure out how to predict the values in these cells from the two values in the row and column, one can make a simpler way to correct for restriction.

Or, we can just use the correct formula, and then we get:

 cor/restriction R 0.1 R 0.2 R 0.3 R 0.4 R 0.5 R 0.6 R 0.7 R 0.8 R 0.9 r 0.1 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.09 r 0.2 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.21 0.20 r 0.3 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 r 0.4 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.39 0.39 r 0.5 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.49 r 0.6 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60 r 0.7 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.71 r 0.8 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80

With discrepancies:

 cor/restriction R 0.1 R 0.2 R 0.3 R 0.4 R 0.5 R 0.6 R 0.7 R 0.8 R 0.9 r 0.1 0 0 0 0 0 0 0 0 -0.01 r 0.2 0 0 0 0 0 0 0 0.01 0 r 0.3 0 0 0 0 0 0 0 0 0 r 0.4 0 0 0 0 0 0 0 -0.01 -0.01 r 0.5 0 0 0 0 0 0 0 0 -0.01 r 0.6 0 0 0 0 0 0 0 0 0 r 0.7 0 0 0 0 0 0 0 0 0.01 r 0.8 0 0 0 0 0 0 0 0 0

Pretty good!

Also, I need to re-do my paper.

R code:

```library(MASS)
library(Hmisc)
library(psych)

pop.cors = seq(.1,.8,.1) #population correlations to test
restrictions = seq(.1,.9,.1) #restriction of ranges in centiles
sample = 1000000 #sample size

#empty dataframe for results
results = data.frame(matrix(nrow=length(pop.cors),ncol=length(restrictions)))
colnames(results) = paste("R",restrictions)
rownames(results) = paste("r",pop.cors)
results.c = results
results.c2 = results

#and fetch!
for (pop.cor in pop.cors){ #loop over population cors
data = mvrnorm(sample, mu = c(0,0), Sigma = matrix(c(1,pop.cor,pop.cor,1), ncol = 2),
empirical = TRUE) #generate data
rowname = paste("r",pop.cor) #get current row names
for (restriction in restrictions){ #loop over restrictions
colname = paste("R",restriction) #get current col names
z.cutoff = qnorm(restriction) #find cut-off
rows.to.keep = data[,1] > z.cutoff #which rows to keep
rdata = data[rows.to.keep,] #cut away data
cor = rcorr(rdata)\$r[1,2] #get cor
results[rowname,colname] = cor #add cor to results
sd = describe(rdata)\$sd[1] #find restricted sd
cor.c = cor/sd #corrected cor, simple formula
results.c[rowname,colname] = cor.c #add cor to results

cor.c2 = cor/sqrt(cor^2+sd^2-sd^2*cor^2) #correct formula
results.c2[rowname,colname] = cor.c2 #add cor to results
}
}

#how much are they off by?
discre = results.c
for (num in 1:length(pop.cors)){
cor = pop.cors[num]
discre[num,] = discre[num,]-cor
}

discre2 = results.c2
for (num in 1:length(pop.cors)){
cor = pop.cors[num]
discre2[num,] = discre2[num,]-cor
}```

## December 24, 2014

### Correlations and likert scales: What is the bias?

Filed under: Math/Statistics — Tags: , , , — Emil O. W. Kirkegaard @ 16:27

How can I correlate ordinal variables (attitude Likert scale) with continuous ratio data (years of experience)?
Currently, I am working on my dissertation which explores learning organisation characteristics at HEIs. One of the predictor demographic variables is the indication of the years of experience. Respondents were asked to fill in the gap the number of years. Should I categorise the responses instead? as for example:
1. from 1 to 4 years
2. from 4 to 10
and so on?
or is there a better choice/analysis I could apply?

My answer may also be of interest to others, so I post it here as well.

Normal practice is to treat likert scales as continuous variable even though they are not. As long as there are >=5 options, the bias from discreteness is not large.

I simulated the situation for you. I generated two variables with continuous random data from two normal distributions with a correlation of .50, N=1000. Then I created likert scales of varying levels from the second variable. Then I correlated all these variables with each other.

Correlations of continuous variable 1 with:

continuous2 0.5
likert10 0.482
likert7 0.472
likert5 0.469
likert4 0.432
likert3 0.442
likert2 0.395

So you see, introducing discreteness biases correlations towards zero, but not by much as long as likert is >=5 level. You can correct for the bias by multiplying by the correction factor if desired:

Correction factor:

continuous2 1
likert10 1.037
likert7 1.059
likert5 1.066
likert4 1.157
likert3 1.131
likert2 1.266

Psychologically, if your data does not make sense as an interval scale, i.e. if the difference between options 1-2 is not the same as between options 3-4, then you should use Spearman’s correlation instead of Pearson’s. However, it will rarely make much of a difference.

Here’s the R code.

library(MASS)
#simulate dataset of 2 variables with correlation of .50, N=1000
simul.data = mvrnorm(1000, mu = c(0,0), Sigma = matrix(c(1,0.50,0.50,1), ncol = 2), empirical = TRUE)
simul.data = as.data.frame(simul.data);colnames(simul.data) = c(“continuous1″,”continuous2”)
#divide into bins of equal length
simul.data[“likert10”] = as.numeric(cut(unlist(simul.data[2]),breaks=10))
simul.data[“likert7”] = as.numeric(cut(unlist(simul.data[2]),breaks=7))
simul.data[“likert5”] = as.numeric(cut(unlist(simul.data[2]),breaks=5))
simul.data[“likert4”] = as.numeric(cut(unlist(simul.data[2]),breaks=4))
simul.data[“likert3”] = as.numeric(cut(unlist(simul.data[2]),breaks=3))
simul.data[“likert2”] = as.numeric(cut(unlist(simul.data[2]),breaks=2))
#correlations
round(cor(simul.data),3)

## December 8, 2014

### Workgroup: Doing Bayesian Data Analysis: A Tutorial Introduction with R and BUGS

Filed under: Math/Statistics — Tags: , — Emil O. W. Kirkegaard @ 14:28

I started a workgroup for going thru the Doing Bayesian Data Analysis: A Tutorial Introduction with R and BUGS book. See thread on forum.

## November 2, 2014

### W values from the Shapiro-Wilk test visualized with different datasets

Filed under: Uncategorized — Tags: , , , — Emil O. W. Kirkegaard @ 21:35

For a mathematical explanation of the test, see e.g. here. However, such an explanation is not very useful for using the test in practice. Just what does a W value of .95 mean? What about .90 or .99? One way to get a feel for it, is to simulate datasets, plot them and calculate the W values. Additionally, one can check the sensitivity of the test, i.e. the p value.

All the code is in R.

```#random numbers from normal distribution
set.seed(42) #for reproducible numbers
x = rnorm(5000) #generate random numbers from normal dist
hist(x,breaks=50, main="Normal distribution, N=5000") #plot
shapiro.test(x) #SW test
>W = 0.9997, p-value = 0.744```

So, as expected, W was very close to 1, and p was large. In other words, SW did not reject a normal distribution just because N is large. But maybe it was a freak accident. What if we were to repeat this experiment 1000 times?

```#repeat sampling + test 1000 times
Ws = numeric(); Ps = numeric() #empty vectors
for (n in 1:1000){ #number of simulations
x = rnorm(5000) #generate random numbers from normal dist
sw = shapiro.test(x)
Ws = c(Ws,sw\$statistic)
Ps = c(Ps,sw\$p.value)
}
hist(Ws,breaks=50) #plot W distribution
hist(Ps,breaks=50) #plot P distribution
sum(Ps<.05) #how many Ps below .05?```

The number of Ps below .05 was in fact 43, or 4.3%. I ran the code with 100,000 simulations too, which takes 10 minutes or something. The value was 4389, i.e. 4.4%. So it seems that the method used to estimate the P value is slightly off in that the false positive rate is lower than expected.

What about the W statistic? Is it sensitive to fairly small deviations from normality?

```#random numbers from normal distribution, slight deviation
x = c(rnorm(4900),rnorm(100,2))
hist(x,breaks=50, main="Normal distribution N=4900 + normal distribution N=200, mean=2")
shapiro.test(x)
>W = 0.9965, p-value = 1.484e-09```

Here I started with a very large norm. dist. and added a small norm dist. to it with a different mean. The difference is hardly visible to the eye, but the P value is very small. The reason is that the large sample size makes it possible to detect even very small deviations from normality. W was again very close to 1, indicating that the distribution was close to normal.

What about a decidedly non-normal distribution?

```#random numbers between -10 and 10
x = runif(5000, min=-10, max=10)
hist(x,breaks=50,main="evenly distributed numbers [-10;10], N=5000")
shapiro.test(x)
>W = 0.9541, p-value < 2.2e-16```

SW wisely rejects this with great certainty as being normal. However, W is near 1 still (.95). This tells us that the W value does not vary very much even when the distribution is decidedly non-normal. For interpretation then, we should probably bark when W drops just under .99 or so.

As a further test of the W values, here’s two equal sized distributions plotted together.

```#normal distributions, 2 sd apart (unimodal fat normal distribution)
x = c(rnorm(2500, -1, 1),rnorm(2500, 1, 1))
hist(x,breaks=50,main="Mormal distributions, 2 sd apart")
shapiro.test(x)
>W = 0.9957, p-value = 6.816e-11
sd(x)
>1.436026```

It still looks fairly normal, altho too fat. The standard deviation is in fact 1.44, or 44% larger than it is supposed to be. The W value is still fairly close to 1, however, and only a little less than from the distribution that was only slightly nonnormal (Ws = .9957 and .9965). What about clearly bimodal distributions?

```#bimodal normal distributions, 4 sd apart
x = c(rnorm(2500, -2, 1),rnorm(2500, 2, 1))
hist(x,breaks=50,main="Normal distributions, 4 sd apart")
shapiro.test(x)
>W = 0.9464, p-value < 2.2e-16```

This clearly looks nonnormal. SW rejects it rightly and W is about .95 (W=0.9464). This is a bit lower than for the evenly distributed numbers. (W=0.9541)

What about an extreme case of nonnormality?

```#bimodal normal distributions, 20 sd apart
x = c(rnorm(2500, -10, 1),rnorm(2500, 10, 1))
hist(x,breaks=50,main="Normal distributions, 20 sd apart")
shapiro.test(x)
>W = 0.7248, p-value < 2.2e-16```

Finally we make a big reduction in the W value.

What about some more moderate deviations from normality?

```#random numbers from normal distribution, moderate deviation
x = c(rnorm(4500),rnorm(500,2))
hist(x,breaks=50, main="Normal distribution N=4500 + normal distribution N=500, mean=2")
shapiro.test(x)
>W = 0.9934, p-value = 1.646e-14```

This one has a longer tail on the right side, but it still looks fairly normal. W=.9934.

```#random numbers from normal distribution, large deviation
x = c(rnorm(4000),rnorm(1000,2))
hist(x,breaks=50, main="Normal distribution N=4000 + normal distribution N=1000, mean=2")
shapiro.test(x)
>W = 0.991, p-value < 2.2e-16```

This one has a very long right tail. W=.991.

In conclusion

Generally we see that given a large sample, SW is sensitive to departures from non-normality. If the departure is very small, however, it is not very important.

We also see that it is hard to reduce the W value even if one deliberately tries. One needs to test extremely non-normal distribution in order for it to fall appreciatively below .99.

## August 31, 2014

### Comments on Learning Statistics with R

Filed under: Differential psychology/psychometrics,Psychology — Tags: , , , — Emil O. W. Kirkegaard @ 23:15

So I found a textbook for learning both elementary statistics much of which i knew but hadnt read a textbook about, and for learning R.

Numbers refer to the page number in the book. The book is in an early version (“0.4”) so many of these are small errors i stumbled upon while going thru virtually all commands in the book in my own R window.

120:

These modeOf() and maxFreq() does not work. This is because the afl.finalists is a factor and they demand a vector. One can use as.vector() to make them work.

131:

Worth noting that summary() is the same as quartile() except that it also includes the mean.

151:

Actually, the output of describe() is not telling us the number of NA. It is only because the author assumes that there are 100 total cases that he can do 100-n and get the number of NAs for each var.

220:

240:

as.logical also converts numeric 0 and 1 to F and T. However, oddly, it does not understand “0” and “1”.

271:

Actually P(0) is not equivalent with impossible. See: http://en.wikipedia.org/wiki/Almost_surely

278:

Actually 100 simulations with N=20 will generally not result in a histogram like the above. Perhaps it is better to change the command to K=1000. And why not add hist() to it so it can be visually compared to the theoretic one?

```
>
hist(rbinom( n = 1000, size = 20, prob = 1/6 ))```

298:

It would be nice if the code for making these simulations was shown.

299:

“This is just bizarre: σ ˆ 2 is and unbiased estimate of the population variance”

Typo.

327:

Typo in Figure 11.6 text. “Notice that when θ actually is equal to .05 (plotted as a black dot)”

344:

Typo.

“That is, what values of X2 would lead is to reject the null hypothesis.”

379:

It is most annoying that the author doesn’t write the code for reproducing his plots. I spent 15 minutes trying to find a function to create histplots by group.

385:

Typo.

“It works for t-tests, but it wouldn’t be meaningful for chi-square testsm F -tests or indeed for most of the tests I talk about in this book.”

391:

“we see that it is 95% certain that the true (population-wide) average improvement would lie between 0.95% and 1.86%.”

This wording is dangerous because there are two interpretations of the percent sign. In the relative sense, they are wrong. The author means absolute %’s.

400:

The code has +’s in it which means it cannot just be copied and runned. This usually isn’t the case, but it happens a few times in the book.

408+410:

In the description of the test, we are told to tick when the values are larger than. However, in the one sample version, the author ticks when the value is equal to. I guess this means that we tick when it is equal to or larger than.

442:

This command doesn’t work because the dataframe isn’t attached as the author assumes.

> mood.gain <- list( placebo, joyzepam, anxifree)

457:

First the author says he wants to use the R^2 non-adjusted, but then in the text he uses the adjusted value.

464:

Typo with “Unless” capitalized.

493:

“(3.45 for drug and 0.92 for therapy),”

He must mean .47 for therapy. .92 is the number for residuals.

497:

In the alternates hypothesis, the author uses “u_ij” instead of “u_rc” which is used in the null-hypothesis. I’m guessing the null-hypothesis is right.

514:

As earlier, it is ambiguous when the author talks about increases in percent. It could be relative or absolute. Again in this case it is absolute. The author should use %point or something to avoid confusion.

538:

Quoting

“I find it amusing to note that the default in R is Type I and the default in SPSS is Type III (with Helmert contrasts). Neither of these appeals to me all that much. Relatedly, I find it depressing that almost nobody in the psychological literature ever bothers to report which Type of tests they ran, much less the order of variables (for Type I) or the contrasts used (for Type III). Often they don’t report what software they used either. The only way I can ever make any sense of what people typically report is to try to guess from auxiliary cues which software they were using, and to assume that they never changed the default settings. Please don’t do this… now that you know about these issues, make sure you indicate what software you used, and if you’re reporting ANOVA results for unbalanced data, then specify what Type of tests you ran, specify order information if you’ve done Type I tests and specify contrasts if you’ve done Type III tests. Or, even better, do hypotheses tests that correspond to things you really care about, and then report those!”

An exmaple of the necessity of open methods along with open data. Science must be reproducible. The best is to simply share the exact source code to the the analyses in a paper.

## January 18, 2012

### Statistics for 2011

Filed under: Uncategorized — Tags: — Emil O. W. Kirkegaard @ 01:06

## August 19, 2010

### Analysis of top 200 (199) data from the Starcraft 2 European ladder

Filed under: Uncategorized — Tags: , , , — Emil O. W. Kirkegaard @ 17:26

Two days ago, Blizzard posted some statistical data about the top 200 (really is 199) players in Europe. I did a bit of statistical analysis on that data and made some nice illustrations as seen below. The data is self-explanatory, but the explanation is unknown to me. It need not necessarily be the case that Terran is overpowered. It may be that players simply like playing it better than the other races.

## June 25, 2008

### Statistics – Dialog

Filed under: Multilogues,Philosophy — Tags: , — Emil O. W. Kirkegaard @ 04:00

Malintent:

78% of all statistics are completely made up.

Wyz_sub10:

…but only 31% of all people know that.

Smullyan-esque:

And 10 out of 9 people just can’t understand statistics at all.

Deleet:

Priceless!