Two very annoying statistical fallacies with p-values

Some time ago, I wrote on Reddit:

There are two errors that I see quite frequently:

  1. Conclude from the fact that a statistically significant difference was found to that a significant socially, scientifically or otherwise difference was found. The reason this won’t work is that any minute difference will be stat.sig. if N is large enough. Some datasets have N=1e6, so very small differences between groups can be found reliably. This does not mean they are worth any attention. The general problem is the lack of focus on effect sizes.
  2. Conclude from the fact that a difference was not statistically significant to that there was no difference in that trait. The error being that they ignore the possibility of false negative; there is a difference, but sample size is too small to reliably detect it or sampling fluctuation caused it to be smaller than usual in the present sample. Together with the misuse of P values, one often sees stuff like “men and women differed in trait1 (p<0.04) but did not differ in trait2 (p>0.05), as if the p value difference of .01 has some magical significance.

These are rather obvious (to me), so I don’t know why I keep reading papers (Wassell et al, 2015) that go like this:

2.1. Experiment 1

In experiment 1 participants filled in the VVIQ2 and reported their current menstrual phase by counting backward the appropriate number of days from the next onset of menstruation. We grouped female participants according to these reports. Fig. 2A shows the mean VVIQ2 score for males and females in the follicular and mid luteal phases (males: M = 56.60, SD = 10.39, follicular women: M = 60.11, SD = 8.84, mid luteal women: M = 69.38, SD = 8.52). VVIQ2 scores varied between menstrual groups, as confirmed by a significant one-way ANOVA, F(2, 52) = 8.63, p < .001, η2 = .25. Tukey post hoc comparisons revealed that mid luteal females reported more vivid imagery than males, p < .001, d = 1.34, and follicular females, p < .05, d = 1.07, while males and follicular females did not differ, p = .48, d = 0.37. These data suggest a possible link between sex hormone concentration and the vividness of mental imagery.

A normal interpretation of the above has the authors as making the fallacy. It is even contradictory, an effect size of d=.37 is a medium-small effect, but in the same sentence they state that there is no effect (i.e. d=0).

However, later on they write:

VVIQ2 scores were found to significantly correlate with imagery strength from the binocular rivalry task, r = .37, p < .01. As is evident in Fig. 3A, imagery strength measured by the binocular rivalry task varied significantly between menstrual groups, F(2, 55) = 8.58, p < .001, η2 = .24, with mid luteal females showing stronger imagery than both males, p < .05, d = 1.03, and late follicular females, p < .001, d = 1.26. These latter two groups’ scores did not differ significantly, p = .51, d = 0.34. Together, these findings support the questionnaire data, and the proposal that imagery differences are influenced by menstrual phase and sex hormone concentration.

Now the authors are back to phrasing it in a way that cannot be taken as the fallacy. Sometimes it gets more silly. One paper, Kleisner et al (2014) which received quite a lot of attention in the media, is based on this kind of subgroup analysis where the effect had p<.05 for one gender but not the other. The typical source of this silliness is the relatively small sample size of most studies combined with the authors’ use of exploratory subgroup analysis (which they pretend to be hypothesis-driven in their testing). Gender, age, and race are the typical groups explored and in combination.

Probably, it best is scientists would stop using “significant” to talk about lowish p values. There is a very large probability that the public will misunderstand this. (There was agood study recently about this, but I can’t find it again, help!)


Kleisner, K., Chvátalová, V., & Flegr, J. (2014). Perceived intelligence is associated with measured intelligence in men but not women. PloS one, 9(3), e81237.

Wassell, J., Rogers, S. L., Felmingam, K. L., Bryant, R. A., & Pearson, J. (2015). Sex hormones predict the sensory strength and vividness of mental imagery. Biological Psychology.

Understanding restriction of range with Shiny!

I made this:


# ui.R
  titlePanel(title, windowTitle = title),
      helpText("Get an intuitive understanding of restriction of range using this interactive plot. The slider below limits the dataset to those within the limits."),
        label = "Restriction of range",
        min = -5, max = 5, value = c(-5, 5), step=.1),
      helpText("Note that these are Z-values. A Z-value of +/- 2 corresponds to the 98th or 2th centile, respectively.")
# server.R
  function(input, output) {
    output$plot <- renderPlot({
      lower.limit = input$limits[1] #lower limit
      upper.limit = input$limits[2]  #upper limit
      #adjust data object
      data["X.restricted"] = data["X"] #copy X
      data[data[,1]<lower.limit | data[,1]>upper.limit,"X.restricted"] = NA #remove values
      group = data.frame(rep("Included",nrow(data))) #create group var
      colnames(group) = "group" #rename
      levels(group$group) = c("Included","Excluded") #add second factor level
      group[["X.restricted"])] = "Excluded" #is NA?
      data["group"] = group #add to data
      xyplot(Y ~ X, data, type=c("p","r"), col.line = "darkorange", lwd = 1,
             group=group, auto.key = TRUE)
    output$text <- renderPrint({
      lower.limit = input$limits[1] #lower limit
      upper.limit = input$limits[2]  #upper limit
      #adjust data object
      data["X.restricted"] = data["X"] #copy X
      data[data[,1]<lower.limit | data[,1]>upper.limit,"X.restricted"] = NA #remove values
      group = data.frame(rep("Included",nrow(data))) #create group var
      colnames(group) = "group" #rename
      levels(group$group) = c("Included","Excluded") #add second factor level
      group[["X.restricted"])] = "Excluded" #is NA?
      data["group"] = group #add to data
      cors = cor(data[1:3], use="pairwise")
      r = round(cors[3,2],2)
      #print output
      str = paste0("The correlation in the full dataset is .50, the correlation in the restricted dataset is ",r)
data = read.csv("data.csv",row.names = 1) #load data
title = "Understanding restriction of range"

Simpler way to correct for restriction of range?

Restriction of range is when the variance in some variable is reduced compared to the true population variance. This lowers the correlation between this variable and other variables. It is a common problem with research on students which are selected for general intelligence (GI) and hence have a lower variance. This means that correlations between GI and whatever found in student samples is too low.

There are some complicated ways to correct for restriction of range. The usual formula used is this:

restriction of range

which is also known as Thorndike’s case 2, or Pearson’s 1903 formula. Capital XY are the unrestricted variables, xy the restricted. The hat on r means estimated.

However, in a paper in review I used the much simpler formula, namely: corrected r = uncorrected r / (SD_restricted/SD_unrestricted) which seemed to give about the right results. But I wasn’t sure this was legit, so I did some simulations.

First, I selected a large range of true population correlations (.1 to .8) and a large range of selectivity (.1 to .9), then I generated very large datasets with each population correlation. Then for each restriction, I cut off the datapoints where the one variable was below the cutoff point, and calculated the correlation in that restricted dataset. Then I calculated the corrected correlation. Then I saved both pieces of information.

This gives us these correlations in the restricted samples (N=1,000,000)

cor/restriction R 0.1 R 0.2 R 0.3 R 0.4 R 0.5 R 0.6 R 0.7 R 0.8 R 0.9
r 0.1 0.09 0.08 0.07 0.07 0.06 0.06 0.05 0.05 0.04
r 0.2 0.17 0.15 0.14 0.13 0.12 0.11 0.10 0.09 0.08
r 0.3 0.26 0.23 0.22 0.20 0.19 0.17 0.16 0.14 0.12
r 0.4 0.35 0.32 0.29 0.27 0.26 0.24 0.22 0.20 0.17
r 0.5 0.44 0.40 0.37 0.35 0.33 0.31 0.28 0.26 0.23
r 0.6 0.53 0.50 0.47 0.44 0.41 0.38 0.36 0.33 0.29
r 0.7 0.64 0.60 0.57 0.54 0.51 0.48 0.45 0.42 0.37
r 0.8 0.75 0.71 0.68 0.65 0.63 0.60 0.56 0.53 0.48


The true population correlation is in the left-margin. The amount of restriction in the columns. So we see the effect of restricting the range.

Now, here’s the corrected correlations by my method:

cor/restriction R 0.1 R 0.2 R 0.3 R 0.4 R 0.5 R 0.6 R 0.7 R 0.8 R 0.9
r 0.1 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.09
r 0.2 0.20 0.20 0.20 0.20 0.21 0.21 0.20 0.20 0.20
r 0.3 0.30 0.31 0.31 0.31 0.31 0.31 0.30 0.30 0.29
r 0.4 0.41 0.41 0.42 0.42 0.42 0.42 0.42 0.42 0.42
r 0.5 0.52 0.53 0.53 0.54 0.54 0.55 0.55 0.56 0.56
r 0.6 0.63 0.65 0.66 0.67 0.68 0.69 0.70 0.70 0.72
r 0.7 0.76 0.79 0.81 0.83 0.84 0.86 0.87 0.89 0.90
r 0.8 0.89 0.93 0.97 1.01 1.04 1.07 1.10 1.13 1.17


Now, the first 3 rows are fairly close deviating by max .1, but it the rest deviates progressively more. The discrepancies are these:

cor/restriction R 0.1 R 0.2 R 0.3 R 0.4 R 0.5 R 0.6 R 0.7 R 0.8 R 0.9
r 0.1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.01
r 0.2 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.00 0.00
r 0.3 0.00 0.01 0.01 0.01 0.01 0.01 0.00 0.00 -0.01
r 0.4 0.01 0.01 0.02 0.02 0.02 0.02 0.02 0.02 0.02
r 0.5 0.02 0.03 0.03 0.04 0.04 0.05 0.05 0.06 0.06
r 0.6 0.03 0.05 0.06 0.07 0.08 0.09 0.10 0.10 0.12
r 0.7 0.06 0.09 0.11 0.13 0.14 0.16 0.17 0.19 0.20
r 0.8 0.09 0.13 0.17 0.21 0.24 0.27 0.30 0.33 0.37


So, if we can figure out how to predict the values in these cells from the two values in the row and column, one can make a simpler way to correct for restriction.

Or, we can just use the correct formula, and then we get:

cor/restriction R 0.1 R 0.2 R 0.3 R 0.4 R 0.5 R 0.6 R 0.7 R 0.8 R 0.9
r 0.1 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.09
r 0.2 0.20 0.20 0.20 0.20 0.20 0.20 0.20 0.21 0.20
r 0.3 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30
r 0.4 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.39 0.39
r 0.5 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.49
r 0.6 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60
r 0.7 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.71
r 0.8 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80


With discrepancies:

cor/restriction R 0.1 R 0.2 R 0.3 R 0.4 R 0.5 R 0.6 R 0.7 R 0.8 R 0.9
r 0.1 0 0 0 0 0 0 0 0 -0.01
r 0.2 0 0 0 0 0 0 0 0.01 0
r 0.3 0 0 0 0 0 0 0 0 0
r 0.4 0 0 0 0 0 0 0 -0.01 -0.01
r 0.5 0 0 0 0 0 0 0 0 -0.01
r 0.6 0 0 0 0 0 0 0 0 0
r 0.7 0 0 0 0 0 0 0 0 0.01
r 0.8 0 0 0 0 0 0 0 0 0


Pretty good!

Also, I need to re-do my paper.

R code:


pop.cors = seq(.1,.8,.1) #population correlations to test
restrictions = seq(.1,.9,.1) #restriction of ranges in centiles
sample = 1000000 #sample size

#empty dataframe for results
results = data.frame(matrix(nrow=length(pop.cors),ncol=length(restrictions)))
colnames(results) = paste("R",restrictions)
rownames(results) = paste("r",pop.cors)
results.c = results
results.c2 = results

#and fetch!
for (pop.cor in pop.cors){ #loop over population cors
  data = mvrnorm(sample, mu = c(0,0), Sigma = matrix(c(1,pop.cor,pop.cor,1), ncol = 2),
                 empirical = TRUE) #generate data
  rowname = paste("r",pop.cor) #get current row names
  for (restriction in restrictions){ #loop over restrictions
    colname = paste("R",restriction) #get current col names
    z.cutoff = qnorm(restriction) #find cut-off = data[,1] > z.cutoff #which rows to keep
    rdata = data[,] #cut away data
    cor = rcorr(rdata)$r[1,2] #get cor
    results[rowname,colname] = cor #add cor to results
    sd = describe(rdata)$sd[1] #find restricted sd
    cor.c = cor/sd #corrected cor, simple formula
    results.c[rowname,colname] = cor.c #add cor to results
    cor.c2 = cor/sqrt(cor^2+sd^2-sd^2*cor^2) #correct formula
    results.c2[rowname,colname] = cor.c2 #add cor to results

#how much are they off by?
discre = results.c
for (num in 1:length(pop.cors)){
  cor = pop.cors[num]
  discre[num,] = discre[num,]-cor

discre2 = results.c2
for (num in 1:length(pop.cors)){
  cor = pop.cors[num]
  discre2[num,] = discre2[num,]-cor

Correlations and likert scales: What is the bias?

A person on ResearchGate asked the following question:

How can I correlate ordinal variables (attitude Likert scale) with continuous ratio data (years of experience)?
Currently, I am working on my dissertation which explores learning organisation characteristics at HEIs. One of the predictor demographic variables is the indication of the years of experience. Respondents were asked to fill in the gap the number of years. Should I categorise the responses instead? as for example:
1. from 1 to 4 years
2. from 4 to 10
and so on?
or is there a better choice/analysis I could apply?

My answer may also be of interest to others, so I post it here as well.

Normal practice is to treat likert scales as continuous variable even though they are not. As long as there are >=5 options, the bias from discreteness is not large.

I simulated the situation for you. I generated two variables with continuous random data from two normal distributions with a correlation of .50, N=1000. Then I created likert scales of varying levels from the second variable. Then I correlated all these variables with each other.

Correlations of continuous variable 1 with:

continuous2 0.5
likert10 0.482
likert7 0.472
likert5 0.469
likert4 0.432
likert3 0.442
likert2 0.395

So you see, introducing discreteness biases correlations towards zero, but not by much as long as likert is >=5 level. You can correct for the bias by multiplying by the correction factor if desired:

Correction factor:

continuous2 1
likert10 1.037
likert7 1.059
likert5 1.066
likert4 1.157
likert3 1.131
likert2 1.266

Psychologically, if your data does not make sense as an interval scale, i.e. if the difference between options 1-2 is not the same as between options 3-4, then you should use Spearman’s correlation instead of Pearson’s. However, it will rarely make much of a difference.

Here’s the R code.

#load library
#simulate dataset of 2 variables with correlation of .50, N=1000 = mvrnorm(1000, mu = c(0,0), Sigma = matrix(c(1,0.50,0.50,1), ncol = 2), empirical = TRUE) =;colnames( = c(“continuous1″,”continuous2″)
#divide into bins of equal length[“likert10″] = as.numeric(cut(unlist([2]),breaks=10))[“likert7″] = as.numeric(cut(unlist([2]),breaks=7))[“likert5″] = as.numeric(cut(unlist([2]),breaks=5))[“likert4″] = as.numeric(cut(unlist([2]),breaks=4))[“likert3″] = as.numeric(cut(unlist([2]),breaks=3))[“likert2″] = as.numeric(cut(unlist([2]),breaks=2))

Is the summed cubes equal to the squared sum of counting integer series?

R can tell us:

DF.numbers = data.frame(cubesum=numeric(),sumsquare=numeric()) #initial dataframe
for (n in 1:100){ #loop and fill in
  DF.numbers[n,"cubesum"] = sum((1:n)^3)
  DF.numbers[n,"sumsquare"] = sum(1:n)^2

library(car) #for the scatterplot() function
scatterplot(cubesum ~ sumsquare, DF.numbers,
            smoother=FALSE, #no moving average
            labels = rownames(DF.numbers), id.n = nrow(DF.numbers), #labels
            log = "xy", #logscales
            main = "Cubesum is identical to sumsquare, proven by induction")

#checks that they are identical, except for the name
all.equal(DF.numbers["cubesum"],DF.numbers["sumsquare"], check.names=FALSE)



One can increase the number in the loop to test more numbers. I did test it with 1:10000, and it was still true.

Paper: Revisiting a 90-year-old debate: the advantages of the mean deviation

Actually im busy doing an exam paper for linguistics class, but it turned out to be not so difficult, so i spent som time on Khan Academy doing probability and statistics courses. i want to master that stuff, especially the stuff i dont currently know the details about, like regression.

anyway, i stumpled into a comment asking about the way the standard deviation is calculated. why not just use the absolute value insted of squaring stuff and taking the square root after? i actually tried that once, and it gives different results! i tried it out becus the teacher’s notes said that it wud giv the same results. pretty neat discovery IMO.

anyway, the other one has a name as well:

here’s a paper that argues that we shud really return to the MD (mean deviation). i didnt understand all the math, but it sure is easier to calculate and the meaning of it easier to grasp, altho its probably too difficult to switch now that most of statistics is based on the SD. still cool tho.

Revisiting a 90-year-old debate the advantages of the mean deviation

ABSTRACT:  This  paper  discusses  the  reliance  of  numerical  analysis  on
the  concept  of  the  standard  deviation,  and  its  close  relative  the  variance.
It  suggests  that  the  original  reasons  why  the  standard  deviation  concept
has  permeated  traditional  statistics  are  no  longer  clearly  valid,  if  they
ever  were.  The  absolute  mean  deviation,  it  is  argued  here,  has many
advantages  over  the  standard  deviation.  It  is more  efficient  as an
estimate  of  a population  parameter  in  the  real-life  situation  where  the
data  contain  tiny  errors,  or  do  not  form  a completely  perfect  normal
distribution.  It  is  easier  to  use,  and more  tolerant  of  extreme  values,  in
the  majority  of  real-life  situations  where  population  parameters  are  not
required.  It  is  easier  for  new  researchers  to  learn  about  and  understand,
and  also  closely  linked  to  a number  of  arithmetic  techniques  already
used  in  the  sociology  of  education  and  elsewhere.  We  could  continue  to
use  the  standard  deviation  instead,  as we  do  presently,  because  so  much
of  the  rest  of  traditional  statistics  is  based  upon  it  (effect  sizes,  and  the
F-test,  for  example).  However,  we  should  weigh  the  convenience  of  this
solution  for  some  against  the  possibility  of  creating  a much  simpler  and
more  widespread  form  of  numeric  analysis  for  many.

Keywords:  variance,  measuring  variation,  political  arithmetic,  mean
deviation,  standard  deviation, social  construction  of  statistics

it also has a new odd use of “social construction” which annoyed me when reading it.

Paper: Musical beauty and information compression: Complex to the ear but simple to the mind? (Nicholas J Hudson)

I was researching a different topic and came across this paper. I was rewatching the Everything is a remix series. Then i looked up som mor relevant links, and came across these videos. One of them mentioned this article.

Complex to the ear but simple to the mind (Nicholas J Hudson)


Background: The biological origin of music, its universal appeal across human cultures and the cause of its beauty
remain mysteries. For example, why is Ludwig Van Beethoven considered a musical genius but Kylie Minogue is
not? Possible answers to these questions will be framed in the context of Information Theory.
Presentation of the Hypothesis: The entire life-long sensory data stream of a human is enormous. The adaptive
solution to this problem of scale is information compression, thought to have evolved to better handle, interpret
and store sensory data. In modern humans highly sophisticated information compression is clearly manifest in
philosophical, mathematical and scientific insights. For example, the Laws of Physics explain apparently complex
observations with simple rules. Deep cognitive insights are reported as intrinsically satisfying, implying that at some
point in evolution, the practice of successful information compression became linked to the physiological reward
system. I hypothesise that the establishment of this “compression and pleasure” connection paved the way for
musical appreciation, which subsequently became free (perhaps even inevitable) to emerge once audio
compression had become intrinsically pleasurable in its own right.
Testing the Hypothesis: For a range of compositions, empirically determine the relationship between the
listener’s pleasure and “lossless” audio compression. I hypothesise that enduring musical masterpieces will possess
an interesting objective property: despite apparent complexity, they will also exhibit high compressibility.
Implications of the Hypothesis: Artistic masterpieces and deep Scientific insights share the common process of
data compression. Musical appreciation is a parasite on a much deeper information processing capacity. The
coalescence of mathematical and musical talent in exceptional individuals has a parsimonious explanation. Musical
geniuses are skilled in composing music that appears highly complex to the ear yet transpires to be highly simple
to the mind. The listener’s pleasure is influenced by the extent to which the auditory data can be resolved in the
simplest terms possible.

Interesting, but it is way too short on data. its not that difficult to acquire som data to test this hypothesis. varius open source lossless compressors ar freely available, im thinking particularly of FLAC compressors. then one needs a juge library of music, and som sort of ranking of the music related to the quality of it. if the hypothesis is correct, then the best music shud com out on top, at least relativly within genres, or within bands etc. i think i will test this myself.

Something about certainty, proofs in math, induction/abduction

This conversation followed me posting the post just before, and several people bringing up the same proof.

Aowpwtomsihermng = Afraid of what people will think of me, so i had Emil remove my name-guy

[09:57:00] Emil – Deleet:
[09:58:50] Aowpwtomsihermng: Your mates know their algebra.
[10:00:09] Emil – Deleet: this guy is a mathematician
[10:00:27] Emil – Deleet: fysicist ppl have not chimed in yet
[10:00:32] Emil – Deleet: they are having classes i think
[10:08:18] Aowpwtomsihermng: Have you worked out the inductive proof yet?
[10:09:33] Emil – Deleet: no
[10:09:40] Emil – Deleet: i dont know how they work in detail
[10:09:43] Emil – Deleet: and it takes time
[10:09:49] Emil – Deleet: and i already crowdsourced the problem
[10:10:00] Emil – Deleet: so… doesnt pay for me to look for it
[10:10:19] Aowpwtomsihermng: CBA, right?
[10:10:24] Emil – Deleet: i didnt even need any fancy math proof to begin with
[10:10:30] Emil – Deleet: since i already proved it to my satisfaction
[10:10:54] Aowpwtomsihermng: Induction in the logical rather than mathematical sense…
[10:11:00] Emil – Deleet: yes
[10:11:17] Aowpwtomsihermng: Not as rigorous, but useful anyway.
[10:11:23] Emil – Deleet: or abduction
[10:11:46] Emil – Deleet: mathematical certainty is overrated
[10:11:48] Emil – Deleet: ;)
[10:11:59] Emil – Deleet: just look at economics
[10:12:02] Emil – Deleet: :P
[10:12:27] Aowpwtomsihermng: You never know, it might have worked for the first twenty numbers then stopped working. Unlikely, but possible.
[10:12:48] Aowpwtomsihermng: At least now you know that’s not the case.
[10:12:49] Emil – Deleet: astronomically unlikely
[10:12:56] Emil – Deleet: and i also tried other random numbers
[10:13:02] Emil – Deleet: like 3242
[10:13:21] Emil – Deleet: IMO, not much certainty was gained
[10:13:50 | Edited 10:14:04] Emil – Deleet: its approximately as likely that we missed an error in the proof as it is that abduction/induction fails in this case
[10:14:26] Aowpwtomsihermng: But once you have two or three proofs, then that likelihood drops dramatically.
[10:14:46] Emil – Deleet: perhaps
[10:15:00] Aowpwtomsihermng: But I take your point, it’s not a *great* deal of extra certainty.
[10:15:15] Emil – Deleet: for practice, its an irrelevant increase
[10:15:34] Emil – Deleet: if it comes at a great time cost – not worth it
[10:15:41] Emil – Deleet: thats what mathematicians are for ;)
[10:15:50] Emil – Deleet: (with the implication that their time isnt worth much! :D)
[10:16:55 | Edited 10:17:14] Aowpwtomsihermng: Right, right. We programmers and mathematicians are mere cogs in the machinery of your grand device.
[10:17:19] Emil – Deleet: ^^
[10:17:36] Emil – Deleet: at least ure part of something great ^^
[10:17:37] Emil – Deleet: :P

An alternative way to calculate squares.. without using multiplication

I was once at a party, and i was somewhat bored and i found this way of calculating the next square. It works without multiplication, so its suitable for mental calculation.

Seeing that i have recently learned python, here’s a python version of it:

n = 10 # how many sqs to return

b = []
def sq(x):
    return x*x
for y in range(1,n):
    print sq(y)

def sqx(x):
    if x == 1:
        return 1
    if x == 2:
        return 4
    return (sqx(x-1)-sqx(x-2))+sqx(x-1)+2

a = []
for y in range (1,n):
    print sqx(y)

In english. First, set the first two squares to 1 and 4, since this method needs to use the two previous squares to calculate the next. Then calculate the absolute difference between these two. Suppose we are looking for 32, so previous two are 1 and 4. Abs diff is 3. Add 2 to this, result 5. Add 5 to previous square, so 4+5=9. 9 is 32.

I have no idea why this works, i just saw a pattern, and confirmed it for the first 20 integers or so.

In the code above, i have defined the function recursively. It is much slower than the other function. I suppose both are slower than the low-level premade function pow(n,m). But it certainly is cool. :P