I don’t have time to do this project right now. However, since I gathered a bunch of relevant resources that I don’t want to have to re-find, I will write them down somewhere — here. My blog is my extended memory.

Music genres are another fuzzy concept, like races. Lots of genres (and classification systems exist), but genres seem fairly arbitrary and their usage also seems fairly inconsistent. However, it is possibly to clean up the mess somewhat by using statistical methods to find patterns of similarity in music.

Recently, someone asked on Reddit about how to do something like this in R. My answer will serve as this post.

Essentially what one needs to find/invent are numeric measures of music that vary between songs and which is related to genres. Some immediate ideas:

  • Beats per minute (bpm): overall and variation.
  • Dynamic range.
  • Length.
  • Relative presence of vocals.
  • Repetitivity.
  • Frequency: overall and variation.
  • Extract melodies. These can be analyzed extensively.

With these data in hand, you can use cluster analysis or dimensional reduction (factor analysis/pca) methods to try to infer clusters of similar music. One needs to use a large collection of varied music for this to work. E.g. 1k or 10k songs.

Some previous literature on the topic:

For validation, one can look up the same songs on LastFM to get their tags. I would also look for artist and album-artist effects, as generally speaking songs from one artist sound alike, and especially so when they are on the same album.

R packages for sound analysis

The tuneR library has been used for clustering before:

http://www.vesnam.com/Rblog/sortmymusic/

The seewave package is pretty comprehensive.

http://rug.mnhn.fr/seewave/

The signal package probably has some useful tools.

https://cran.r-project.org/web/packages/signal/index.html