Lately I’ve been interested in cluster analysis and factor analysis. These two families of analyses have extremely many practical data-related uses. So far I’ve begun cluster analyzing Wikipedia to get an overall idea about the structure of human knowledge (how cool is that?). I’ve also read Arthur Jensen’s The g Factor to get an idea about how factor analysis works with regards to intelligence testing, and other psychometrics and biometrics (like the proposed f factor).

Today I was reading a book about the future of schooling, Salman Khan’s The One World Schoolhouse (I will post my review soon). In the book he mentions some stuff about homework. I was curious and looked up his sources. That got me reading a meta-analysis (another kind of analysis! I love analysis) about the effects of homework. While reading that I got a new idea for an analysis.

The idea

Citation indexes already exist. With such an index, one can look up a particular paper and find other papers that cite that paper. Or one can look up an author and see which papers he has published and who cites those papers and so on. However, these tools have no or poor graphical representations of the data. It is a shame, since graphical representations of data are so much more useful and cool. One need only watch a couple of TED talks about the subject to be convinced:

There are various things that one can show graphically in a very illustrative way. My idea is to have each paper as a node and have lines between them that indicate who cites who. These lines would normally be one-directional, since it is difficult to cite a paper that will be published in the future (but it happens that papers cite other papers that are “in press”, so in a sense it’s not unheard of). My idea is that one of y-axis (or x-axis if one prefers that) time is showed. In this way one can follow the citations of a papers over time. More interestingly, one can follow the citations between the other papers that cite the first paper over time. A web that becomes more complex over time, or perhaps dies of, if the academic community loses interest in that particular subject (academic interest is a bit like fashion).

Here’s a fictive example that I have made to show off the general idea:

(Proposal A graphical tool to explore relationships between academic papers)

In the example above, there are 20 papers marked for interest. All the citations between them are then found, and shown with lines. Optimally, the direction of the relationships should also be shown, perhaps by small arrows on the lines. Also optimally, the authors or names or both of the papers should be shown in a very small font on top of the papers or something like that. It should be enlarged when the mouse is on top of the nodes, with links to the actual papers, and the abstract ready to be read.

It it also possible to color the nodes after authors or research groups. In the example above, there are two lines of authors, or research groups, or research programs. The left one publishes more papers than the right one. One can employ various coloring schemes to make such features salient in the graphical representation. One can also see how the two lines interrelate; they do cite each others papers, just not as frequent as they cite their own papers.

One can also change the nodes with respect to other information than the authors. One can control their size relative to the papers individual citation count, for instance. This makes it easier for an outsider to locate the papers that gathered the most cites (either in general, or in the pool of papers of interest), and hence, most likely the most interest from fellow researchers. If one wants, one can also do the opposite, and look for hidden gems of insight in the literature that have been missed by other authors.

Even better, given the problems with replications, especially direct replications in some fields of science, especially psychology, one can color nodes after whether they are replications of previous papers or not. One could also have special arrows for replications. Similarly, literature reviews, meta-reviews, systematic reviews could have their own node shape or color so that one can locate them more easily. Surely, something like this is the proper view of evaluating the influence of scientific papers.

What next?

Two things. Improve the ideas, and add to them. Then 1) Find programmers, and convince them that the project is cool and that they should invest their time in it! 2) Find other people that have more prestige and hopefully access to funding that can be used to hire programming people to convert the ideas to reality.


Leave a Reply