Actually im busy doing an exam paper for linguistics class, but it turned out to be not so difficult, so i spent som time on Khan Academy doing probability and statistics courses. i want to master that stuff, especially the stuff i dont currently know the details about, like regression.

anyway, i stumpled into a comment asking about the way the standard deviation is calculated. why not just use the absolute value insted of squaring stuff and taking the square root after? i actually tried that once, and it gives different results! i tried it out becus the teacher’s notes said that it wud giv the same results. pretty neat discovery IMO.

anyway, the other one has a name as well: en.wikipedia.org/wiki/Absolute_deviation

here’s a paper that argues that we shud really return to the MD (mean deviation). i didnt understand all the math, but it sure is easier to calculate and the meaning of it easier to grasp, altho its probably too difficult to switch now that most of statistics is based on the SD. still cool tho.

—

Revisiting a 90-year-old debate the advantages of the mean deviation

ABSTRACT: This paper discusses the reliance of numerical analysis on

the concept of the standard deviation, and its close relative the variance.

It suggests that the original reasons why the standard deviation concept

has permeated traditional statistics are no longer clearly valid, if they

ever were. The absolute mean deviation, it is argued here, has many

advantages over the standard deviation. It is more efficient as an

estimate of a population parameter in the real-life situation where the

data contain tiny errors, or do not form a completely perfect normal

distribution. It is easier to use, and more tolerant of extreme values, in

the majority of real-life situations where population parameters are not

required. It is easier for new researchers to learn about and understand,

and also closely linked to a number of arithmetic techniques already

used in the sociology of education and elsewhere. We could continue to

use the standard deviation instead, as we do presently, because so much

of the rest of traditional statistics is based upon it (effect sizes, and the

F-test, for example). However, we should weigh the convenience of this

solution for some against the possibility of creating a much simpler and

more widespread form of numeric analysis for many.

Keywords: variance, measuring variation, political arithmetic, mean

deviation, standard deviation, social construction of statistics

—

it also has a new odd use of “social construction” which annoyed me when reading it.

Normal practice is to treat likert scales as continuous variable even though they are not. As long as there are >=5 options, the bias from discreteness is not large.

I simulated the situation for you. I generated two variables with continuous random data from two normal distributions with a correlation of .50, N=1000. Then I created likert scales of varying levels from the second variable. Then I correlated all these variables with each other.

Correlations of continuous variable 1 with:

continuous2 0.5

likert10 0.482

likert7 0.472

likert5 0.469

likert4 0.432

likert3 0.442

likert2 0.395

So you see, introducing discreteness biases correlations towards zero, but not by much as long as likert is >=5 level. You can correct for the bias by multiplying by the correction factor if desired:

Correction factor:

continuous2 1

likert10 1.037

likert7 1.059

likert5 1.066

likert4 1.157

likert3 1.131

likert2 1.266

Psychologically, if your data does not make sense as an interval scale, i.e. if the difference between options 1-2 is not the same as between options 3-4, then you should use Spearman’s correlation instead of Pearson’s. However, it will rarely make much of a difference.

Here’s the R code.

#load librarylibrary(MASS)#simulate dataset of 2 variables with correlation of .50, N=1000simul.data = mvrnorm(1000, mu = c(0,0), Sigma = matrix(c(1,0.50,0.50,1), ncol = 2), empirical = TRUE)simul.data = as.data.frame(simul.data);colnames(simul.data) = c(“continuous1″,”continuous2″)#divide into bins of equal lengthsimul.data[“likert10″] = as.numeric(cut(unlist(simul.data[2]),breaks=10))simul.data[“likert7″] = as.numeric(cut(unlist(simul.data[2]),breaks=7))simul.data[“likert5″] = as.numeric(cut(unlist(simul.data[2]),breaks=5))simul.data[“likert4″] = as.numeric(cut(unlist(simul.data[2]),breaks=4))simul.data[“likert3″] = as.numeric(cut(unlist(simul.data[2]),breaks=3))simul.data[“likert2″] = as.numeric(cut(unlist(simul.data[2]),breaks=2))#correlationsround(cor(simul.data),3)