Normal practice is to treat likert scales as continuous variable even though they are not. As long as there are >=5 options, the bias from discreteness is not large.
I simulated the situation for you. I generated two variables with continuous random data from two normal distributions with a correlation of .50, N=1000. Then I created likert scales of varying levels from the second variable. Then I correlated all these variables with each other.
Correlations of continuous variable 1 with:
continuous2 0.5
likert10 0.482
likert7 0.472
likert5 0.469
likert4 0.432
likert3 0.442
likert2 0.395
So you see, introducing discreteness biases correlations towards zero, but not by much as long as likert is >=5 level. You can correct for the bias by multiplying by the correction factor if desired:
Correction factor:
continuous2 1
likert10 1.037
likert7 1.059
likert5 1.066
likert4 1.157
likert3 1.131
likert2 1.266
Psychologically, if your data does not make sense as an interval scale, i.e. if the difference between options 1-2 is not the same as between options 3-4, then you should use Spearman’s correlation instead of Pearson’s. However, it will rarely make much of a difference.
Here’s the R code.
#load library
library(MASS)
#simulate dataset of 2 variables with correlation of .50, N=1000
simul.data = mvrnorm(1000, mu = c(0,0), Sigma = matrix(c(1,0.50,0.50,1), ncol = 2), empirical = TRUE)
simul.data = as.data.frame(simul.data);colnames(simul.data) = c(“continuous1″,”continuous2”)
#divide into bins of equal length
simul.data[“likert10”] = as.numeric(cut(unlist(simul.data[2]),breaks=10))
simul.data[“likert7”] = as.numeric(cut(unlist(simul.data[2]),breaks=7))
simul.data[“likert5”] = as.numeric(cut(unlist(simul.data[2]),breaks=5))
simul.data[“likert4”] = as.numeric(cut(unlist(simul.data[2]),breaks=4))
simul.data[“likert3”] = as.numeric(cut(unlist(simul.data[2]),breaks=3))
simul.data[“likert2”] = as.numeric(cut(unlist(simul.data[2]),breaks=2))
#correlations
round(cor(simul.data),3)
Normal practice is to treat likert scales as continuous variable even though they are not. As long as there are >=5 options, the bias from discreteness is not large.
I simulated the situation for you. I generated two variables with continuous random data from two normal distributions with a correlation of .50, N=1000. Then I created likert scales of varying levels from the second variable. Then I correlated all these variables with each other.
Correlations of continuous variable 1 with:
continuous2 0.5
likert10 0.482
likert7 0.472
likert5 0.469
likert4 0.432
likert3 0.442
likert2 0.395
So you see, introducing discreteness biases correlations towards zero, but not by much as long as likert is >=5 level. You can correct for the bias by multiplying by the correction factor if desired:
Correction factor:
continuous2 1
likert10 1.037
likert7 1.059
likert5 1.066
likert4 1.157
likert3 1.131
likert2 1.266
Psychologically, if your data does not make sense as an interval scale, i.e. if the difference between options 1-2 is not the same as between options 3-4, then you should use Spearman’s correlation instead of Pearson’s. However, it will rarely make much of a difference.
Here’s the R code.
#load library
library(MASS)
#simulate dataset of 2 variables with correlation of .50, N=1000
simul.data = mvrnorm(1000, mu = c(0,0), Sigma = matrix(c(1,0.50,0.50,1), ncol = 2), empirical = TRUE)
simul.data = as.data.frame(simul.data);colnames(simul.data) = c(“continuous1″,”continuous2”)
#divide into bins of equal length
simul.data[“likert10”] = as.numeric(cut(unlist(simul.data[2]),breaks=10))
simul.data[“likert7”] = as.numeric(cut(unlist(simul.data[2]),breaks=7))
simul.data[“likert5”] = as.numeric(cut(unlist(simul.data[2]),breaks=5))
simul.data[“likert4”] = as.numeric(cut(unlist(simul.data[2]),breaks=4))
simul.data[“likert3”] = as.numeric(cut(unlist(simul.data[2]),breaks=3))
simul.data[“likert2”] = as.numeric(cut(unlist(simul.data[2]),breaks=2))
#correlations
round(cor(simul.data),3)