RCurl in R on Mint 17

So I was installing something (forgot what), and had problems with RCurl:

* installing *source* package ‘RCurl’ …
** package ‘RCurl’ successfully unpacked and MD5 sums checked
checking for curl-config… no
Cannot find curl-config
ERROR: configuration failed for package ‘RCurl’
* removing ‘/home/lenovo/R/x86_64-pc-linux-gnu-library/3.1/RCurl’
Warning in install.packages :
installation of package ‘RCurl’ had non-zero exit status

Solution was to do sudo apt-get install libcurl4-openssl-dev (found in a comment here).

Sometimes doing elementary things in R is a pain

Getting a percentage table from a dataframe

A reviewer asked me to:

1) As I said earlier, there should be some data on the countries of origin of the immigrant population. Most readers have no idea who actually moves to Denmark. At the very least, there should be basic information like “x% of the immigrant population is of non-European origin and y% of European origin as of 2014.” Generally, non-European immigration would be expected to increase inequality more, given that IQ levels are relatively uniform across Europe.

I have population counts for each year 1980 through 2014 in a dataframe and I’d like to get them as a percent of each year so as to get the relative sizes of the countries. There is a premade function for this, prop.table, however, it works quite strangely. If one gives it a dataframe and no margin, it will use the total sum of the data.frame instead of by column. This is sometimes useful, but not in this case. However, if one gives it a data.frame and margin=2, it will complain that:

Error in margin.table(x, margin) : 'x' is not an array

Which is odd when it just accepted it before. The relatively lack of documentation made it not quite easy to figure out how to make it work. Turns out that one just has to convert the dataframe to a matrix when giving it:

census.percent = prop.table(as.matrix(census), margin=2)

and then one can convert it back and also multiple by 100 to get percent instead of fractions:

census.percent = as.data.frame(prop.table(as.matrix(census), margin=2)*100)

Getting the top 10 countries with names for selected years

This one was harder. Here’s the code I ended up with:

selected.years = c("X1980","X1990","X2000","X2010","X2014") #years of interest
for (year in selected.years){ #loop over each year of interest
  vector = census.percent[,year,drop=FALSE] #get the vector, DONT DROP!
  View(round(vector[order(vector, decreasing = TRUE),,drop=FALSE][1:10,,drop=FALSE],1)) #sort vector, DONT drop! and get 1:10 and DONT DROP!
}

First we choose the years we want (note that X goes in front because R has trouble handling columns that begin with a number). Then we loop over each year of interest. Then we pick it out to avoid having to select the same column over and over. However, normally when picking out 1 column from a dataframe, R will convert it to numeric, which is very bad because this removes the rownames. That means that even tho we can find the top 10 countries, we don’t know which ones they are. The solution for this is to set drop=FALSE. The next part consists of first ordering the vector (without drop!), and then selecting the top 10 countries without dropping. I open them in View (in Rstudio) because this makes it easier to copy the values for further use (e.g. in a table for a paper).

So, drop=FALSE is another one of those pesky small things to remember. It is just like stringsAsFactors=FALSE when using read.table (or read.csv).

 

W values from the Shapiro-Wilk test visualized with different datasets

For a mathematical explanation of the test, see e.g. here. However, such an explanation is not very useful for using the test in practice. Just what does a W value of .95 mean? What about .90 or .99? One way to get a feel for it, is to simulate datasets, plot them and calculate the W values. Additionally, one can check the sensitivity of the test, i.e. the p value.

All the code is in R.

#random numbers from normal distribution
set.seed(42) #for reproducible numbers
x = rnorm(5000) #generate random numbers from normal dist
hist(x,breaks=50, main="Normal distribution, N=5000") #plot
shapiro.test(x) #SW test
>W = 0.9997, p-value = 0.744

SW_norm

So, as expected, W was very close to 1, and p was large. In other words, SW did not reject a normal distribution just because N is large. But maybe it was a freak accident. What if we were to repeat this experiment 1000 times?

#repeat sampling + test 1000 times
Ws = numeric(); Ps = numeric() #empty vectors
for (n in 1:1000){ #number of simulations
  x = rnorm(5000) #generate random numbers from normal dist
  sw = shapiro.test(x)
  Ws = c(Ws,sw$statistic)
  Ps = c(Ps,sw$p.value)
}
hist(Ws,breaks=50) #plot W distribution
hist(Ps,breaks=50) #plot P distribution
sum(Ps<.05) #how many Ps below .05?

The number of Ps below .05 was in fact 43, or 4.3%. I ran the code with 100,000 simulations too, which takes 10 minutes or something. The value was 4389, i.e. 4.4%. So it seems that the method used to estimate the P value is slightly off in that the false positive rate is lower than expected.

What about the W statistic? Is it sensitive to fairly small deviations from normality?

#random numbers from normal distribution, slight deviation
x = c(rnorm(4900),rnorm(100,2))
hist(x,breaks=50, main="Normal distribution N=4900 + normal distribution N=200, mean=2")
shapiro.test(x)
>W = 0.9965, p-value = 1.484e-09


Here I started with a very large norm. dist. and added a small norm dist. to it with a different mean. The difference is hardly visible to the eye, but the P value is very small. The reason is that the large sample size makes it possible to detect even very small deviations from normality. W was again very close to 1, indicating that the distribution was close to normal.

What about a decidedly non-normal distribution?

#random numbers between -10 and 10
x = runif(5000, min=-10, max=10)
hist(x,breaks=50,main="evenly distributed numbers [-10;10], N=5000")
shapiro.test(x)
>W = 0.9541, p-value < 2.2e-16

 

SW_even

SW wisely rejects this with great certainty as being normal. However, W is near 1 still (.95). This tells us that the W value does not vary very much even when the distribution is decidedly non-normal. For interpretation then, we should probably bark when W drops just under .99 or so.

As a further test of the W values, here’s two equal sized distributions plotted together.

#normal distributions, 2 sd apart (unimodal fat normal distribution)
x = c(rnorm(2500, -1, 1),rnorm(2500, 1, 1))
hist(x,breaks=50,main="Mormal distributions, 2 sd apart")
shapiro.test(x)
>W = 0.9957, p-value = 6.816e-11
sd(x)
>1.436026

SW_norm3 It still looks fairly normal, altho too fat. The standard deviation is in fact 1.44, or 44% larger than it is supposed to be. The W value is still fairly close to 1, however, and only a little less than from the distribution that was only slightly nonnormal (Ws = .9957 and .9965). What about clearly bimodal distributions?

#bimodal normal distributions, 4 sd apart
x = c(rnorm(2500, -2, 1),rnorm(2500, 2, 1))
hist(x,breaks=50,main="Normal distributions, 4 sd apart")
shapiro.test(x)
>W = 0.9464, p-value < 2.2e-16

SW_norm4

This clearly looks nonnormal. SW rejects it rightly and W is about .95 (W=0.9464). This is a bit lower than for the evenly distributed numbers. (W=0.9541)

What about an extreme case of nonnormality?

#bimodal normal distributions, 20 sd apart
x = c(rnorm(2500, -10, 1),rnorm(2500, 10, 1))
hist(x,breaks=50,main="Normal distributions, 20 sd apart")
shapiro.test(x)
>W = 0.7248, p-value < 2.2e-16

SW_norm5

Finally we make a big reduction in the W value.

What about some more moderate deviations from normality?

#random numbers from normal distribution, moderate deviation
x = c(rnorm(4500),rnorm(500,2))
hist(x,breaks=50, main="Normal distribution N=4500 + normal distribution N=500, mean=2")
shapiro.test(x)
>W = 0.9934, p-value = 1.646e-14

SW_norm6

This one has a longer tail on the right side, but it still looks fairly normal. W=.9934.

#random numbers from normal distribution, large deviation
x = c(rnorm(4000),rnorm(1000,2))
hist(x,breaks=50, main="Normal distribution N=4000 + normal distribution N=1000, mean=2")
shapiro.test(x)
>W = 0.991, p-value < 2.2e-16

SW_norm7

This one has a very long right tail. W=.991.

In conclusion

Generally we see that given a large sample, SW is sensitive to departures from non-normality. If the departure is very small, however, it is not very important.

We also see that it is hard to reduce the W value even if one deliberately tries. One needs to test extremely non-normal distribution in order for it to fall appreciatively below .99.

DMCA #whoknows

Cengage Learning
27500 Drake Road
Farmington Hills, MI 48331
 
 
Tuesday, February 11, 2014
 
 
RE: Unauthorized Use of Cengage Learning Material
 
In reference to the following Cengage Learning product(s): An Introduction to Language 9th Edition by Victoria Fromkin
 
Dear Sir:
 
It has been brought to our attention that material belonging to a Cengage Learning company has been used without acquiring permission. Copyrighted material from the title listed above is posted to the following unprotected URL:emilkirkegaard.dk/en/wp-content/uploads/Victoria-Fromkin-Robert-Rodman-Nina-Hyams-An-Introduction-to-Language.pdf
 
We can find no records to indicate that Cengage Learning granted you permission to reproduce its material to a publically accessible website. As such, we ask that you remove this material immediately and confirm that you have done so.
 
This letter is strictly without waiver of, or prejudice to, our rights, claims or remedies, all of which are hereby expressly reserved.
 
Sincerely,
 
Heather Ungarten
Infringement and Anti-Piracy Paralegal
Cengage Learning
27500 Drake Rd., Farmington Hills, MI 48331

DMCA #whatever

To Whom it May Concern,

Pursuant to 17 USC 512(c)(3)(A), this communication serves as a statement that:

(1). I am the duly authorized representative of the exclusive rights holder for the books entitled Science and Pseudoscience in Clinical Psychology

(2). These exclusive rights are being violated by material available upon your site at the following URL(s):
emilkirkegaard.dk/en/wp-content/uploads/Science-and-pseudoscience-in-clinical-psychology-edited-by-Scott-O.-Lilienfeld-Steven-Jay-Lynn-Jeffrey-M.-Lohr..pdf

(3) I have a good faith belief that the use of this material in such a fashion is not authorized by the copyright holder, the copyright holder’s agent, or the law;

(4) Under penalty of perjury in a United States court of law, I state that the information contained in this notification is accurate, and that I am authorized to act on the behalf of the exclusive rights holder for the material in question;


(5) I may be contacted by any of methods per the signature line below. I hereby request that you remove or disable access to this material as it appears on your service in as expedient a fashion as possible. We would like written confirmation of the material’s removal by
07 November 2013

Guilford Publications: 72 Spring Street, 4th Floor, New York, NY 10012

Sony tries censorship…

arstechnica.com/gaming/2013/10/when-a-virtual-actress-nude-images-leak-who-should-take-the-legal-blame/

 

When nude images of Jodie Holmes, actress Ellen Page’s character from Beyond: Two Souls, began appearing on the Internet a few weeks ago (courtesy of a repositioned shower-scene camera running on debug hardware) we thought the story was a little too tabloidy to cover. This kind of embarrassing, tawdry celebrity gossip is pretty common in the entertainment industry, even if it’s relatively rare in video games particularly. Scandals revolving around supposedly inaccessible adult content in games aren’t completely unheard of, though; remember GTA: San AndreasHot Coffee?

But when reports surfaced earlier this week that Sony was making vague legal threats in an effort to remove those images from the Internet, our news ears started perking up a little.

Nordic entertainment site Eskimo Press was the first to report that Sony Computer Entertainment Europe asked them to take down the leaked images, citing unspecified “legal reasons” for the request. This action came despite the fact that Eskimo Press merely linked to the images on another server rather than hosting them itself. Culture site Gaming Blend said it received a similar request from Sony Computer Entertainment America, which went so far as to request that the original story be taken down entirely.

 

I hate censorship. I went to /b/ to get the pics. They are nothing special.

DMCA

Technically, these shudnt apply in DK anyway, or what?

Subject: Notice of Copyright Infringement
Sender: Ian Noble
The Publishers Association
29b Montague Street
London
WC1B 5BW
Tel: +44 (0)20 7691 9191

Recipient: EMILKIRKEGAARD-DK

RE: Copyright Infringement.

This notice complies with the Digital Millennium Copyright Act (17 U.S.C. §512(c)(3))

I, Ian Noble, swear under penalty of perjury that I am authorised to act on behalf of the Rights Owner(s) listed below for the copyright works listed below.

It has come to my attention that the website (emilkirkegaard.dk) is engaged in the electronic distribution of copies of these works. It is my good faith belief that the use of these works in this manner is not authorised by the copyright owner, his agent or the law.  This is in clear violation of United States, European Union, and International copyright law, and I now request that you expeditiously remove this material from the website (emilkirkegaard.dk), or block or disable access to it, as required under both US and EU law.

The works are listed below.

The URLs to identify the infringing files and the means to locate them are listed below.

I believe in good faith that use of the aforementioned material is not authorized by the copyright owner, its agents, or the law.

The information in this notice is accurate and I request that you expeditiously remove or block or disable access to all the infringing material or the entire site.

/Ian Noble/
Ian Noble

Monday September 23, 2013

==============================

=========================================================

Ref:                       1202159/1632038
From:                      Debbie  Poole
Rights Owner:              John Wiley & Sons
Title:                     50 Great myths of Popular Psychology: Shattering Widespread Misconceptions about Human Behavior
Author:                    Scott O. Lilienfeld, Steven Jay Lynn, John Ruscio, and Barry L. Beyerstein.
Search:                    50 Great myths of Popular Psychology: Shattering Widespread Misconceptions about Human Behavior
URL:                       emilkirkegaard.dk/en/wp-content/uploads/50-Great-Myths-of-Popular-PsychologyTeam-Nanbantmrg.pdf
IP Address                 94.231.108.37

Copyright works:

Paper: Introducing the construct curiosity for predicting job performance

Introducing the construct curiosity for predicting job performance

Mussel, Patrick. “Introducing the construct curiosity for predicting job performance.” Journal of Organizational Behavior (2012).
I read this a while ago. pretty interesting.

Summary

The present paper provides a conceptual and empirical examination regarding the relevance of the construct curiosity for work-related outcomes. On the basis of a review and integration of the literature regarding the construct itself, the construct is conceptually linked with performance in the work context. In line with a confirmatory research strategy, the sample of the present study (N = 320) has requirements which reflect this conceptual link. Results from a concurrent validation study confirmed the hypothesis regarding the significance of curiosity for job performance (r = .34). Furthermore, incremental validity of curiosity above 12 cognitive and non-cognitive predictors for job performance suggests that curiosity captures variance in the criterion that is not explained by predictors traditionally used in organizational psychology. It is concluded that curiosity is an important variable for the prediction and explanation of work-related behavior. Furthermore, given the dramatic changes in the world of work, the importance is likely to rise, rather than to decline, which has important implications for organizational theories and applied purposes, such as personnel selection.