Categories vs. Tags

Posted by Catherine Motuz on July 6th, 2012

The ELVIS project is booming, with over 4000 entries. Composers with over 100 entries include Chopin, Handel, Haydn, Obrecht, Palestrina, Scarlatti, Schubert, and Bach, with more than 1000 pieces or movements.

At first we met regularly to discuss the best ways to organize our information, learning that in the field of information science, computers are changing the way we have to think about our data. Previously, we would put it into categories such as those provided by the Library of Congress, but now that keyword searches are so prevalent and the field of corpus linguistics is blossoming, so tagging data is gaining headway. The downside to tagging is that you have to know what you are looking for so that you can type it into the search box. Meanwhile, categories offer easy browsing, but the problem with categories is that there has to be the right pre-defined category to fit every element of a piece worth recording (genre, country, era etc.). In music there are so many genres and subgenres that there is a danger of undermining the browsing advantages of categories by simply having too many of them.

The solution with the ELVIS database has been to tag, and then go through and organize tags into categories, producing what appears to be the best of both worlds. The clever part is that when a piece is uploaded, it can be tagged at any level of specificity, and the levels above it will be activated. For instance, one tag, such as "bourree," will activate the more general tags "dance" and "instrumental genres," so that if someone wants to look at dance music, they don’t need to list every possible dance. It also means that outside researchers don’t need to know our vocabulary. For instance, "instrumental" on ELVIS refers to the presence of any instrument but not the absence of vocal parts, so if a person wanted to find instrumental music in the search box, they would have to write "instrumental NOT vocal." This is not obvious, but in a faceted (category tree) search, it becomes clear.

Another advantage to tags is that they can be interesting on their own. How many times do we use each tag? Which tags appear near other tags? These questions give us a better idea of our database than categories, and we can see them at a glance by creating a "tag cloud" like the one shown here.