CIRMMT Workshop, September 7th, 2013, Part II: Rodan and Diva
Posted by Catherine Motuz on September 13, 2013
Andrew Hankinson & Ryan Bannon: An introduction to Rodan
Andrew began the session by outlining the deficiencies of commercially-available optical music recognition software, as these provided the impetus for him to bring the process back to the drawing board. In most cases, OMR is so unreliable that it is faster for an advanced user to enter notation by hand than to correct all the errors made by the system. It is also improssible to improve the system because all the (numerous) processes involved in OMR are carried out in a “black box” process: you see what goes in and what comes out but none of what happens in the middle. It is difficult for two users to carry out work on a single source simultaneously, and each user is limited to the processing power of their desktop computer or laptop. OMR systems don’t learn, which has two implications: first, one page will be as good or bad as the last, and scaling a few pages to a few million is simply not possible; second, it is impossible to teach these systems how to adapt to non common Western music notations, severely limiting the repertoire which can be analyzed.
Enter Rodan: A Web-Based OMR Workflow.
Rodan addresses the issues above by embracing three basic design principles:
-
Collaborative OMR: Many geographically scattered users working together through transparent, customizable workflows.
-
Distributed OMR: many machines distributing tasks.
-
Adaptive OMR: Recognition improves with feedback from humans.
Andrew then continued by giving a demonstration of Rodan with the assistance of Ryan Bannon, constructing a workflow of OMR tasks (edge detection, cropping, binarization etc.), running OMR and editing the output, all running on Rodan in a browser. Constuctive feedback from the other attendees then followed. Julie Cumming noted that the method of correction by overlaying music notation generated by OMR with an original image bears the flaw that if an original image has skewed staves, the overlay will not work even if all notes are correct. Craig Sapp asked if it might be possible to operate the website through a command line rather than a mouse (this will be addressed below), and Laurent Pugin suggested that it might be worth investigating having the computer try out different routes through the possible workflows, leading to optimized combinations of workflow steps that would take a long time to reach through human trial and error.
Laurier Baribeau: Neume Classification in Rodan
Laurier gave an introduction to Gamera, the OMR system that recognizes neume shapes through connected-component analysis. As explained in a previous blog post, Gamera works by analyzing the features of a glyph, measuring things such as the total black area, black-to-white ratio on the x and y axes, the number of holes in a glyph and its compactness, weighing up these features to come up with a hypothesis of what a new glyph could be, based on previous classifications. The more diverse (and correct!) the sample set, the more likely Gamera is to guess a new shape correctly.
There is now a version of Gamera online, and the next challenge is to build a robust version for use in a web browser, so that it can be fully integrated into Rodan. Laurier explained that in order to build a web application, one has to draw a line down the middle of a programme to separate what tasks the server will do from those the client browser will do, and make sure that the two systems can communicate by keeping a consistent data structure. He then gave a short demonstration of the encoding of a .png file, explaining how behing a graphical user interface (GUI), entire images are communicated simply as strings of letters. He then began a demo of the web version of Gamera and, answering a concern posed by Craig Sapp, discussed future enhancements such as using bounding boxes to locate an extracted glyph on an image file in order to provide a glyph’s original context to users.
Deepanjan Roy: Rodan’s Application Programming Interface (API), and what you can do with it
Deepanjan began by explaning what an API is, and what it allows users to do. He did this by showing a simple Google search in an Internet browser, and then explained that an API is a system that what API enables computer programs to control other programs In the case of a search engine, using Google API, a computer could be programmed to automatically interact directly with various aspects of existing search functionality in order to determine, for instance, what pages or themes are trending.
In the case of Rodan, the elegent GUI that everyday users find friendly is impossible for computers to interact with, so the Rodan team has designed an API for it, allowing a computer to operate every aspect of Rodan that is available on the browser GUI. This API incorporates RESTful architecture, makes full use of http verbs such as GET/POST/PATCH/DELETE, and stores json data—all standards which allow Rodan data to interact with other programs easily. As a demo, Deepanjan showed how a script could access Rodan, taking all of the neume data extracted from a source using OMR and dumping it into a google spreadsheet, which could then provide simple stastistic as to the frequency of each type of neume. Using the data from the Salzinnes library, he showed that the punctum is by far the most common symbol in the manuscript.
Timothy Wilfong: OMR Challenges with the St. Gallen Codices
The St. Gallen monastery was founded in northeastern Switzerland around 613, and the Hartker Aniphonary (Stiftbibliothek, codices 390 (winter) and 391 (summer)) is one of the earliest extant sources of chants for liturgical hours, dating c.990-1000. It is notated in staffless neumes, which provide relative, but not definite pitches, providing one of many challenges to constucting a useful OMR system.
Indeed, as Timothy explained, bringing this manuscript into the Rodan project presents a diverse array of challenges: the manuscript quality is poor, the nomenclature of the neumes is much vaster than in either the Liber Usualis or in the Salzinnes manuscript, and a digital infrastructure for heightened neumes has to be created in order to integrate St. Gallen notation into Neon.js.
In order to elaborate on these challenges, Timothy showed how a workflow presently functions in Rodan, using the Salzinnes antiphonal and highlighting the staff-finding and pitch-finding components (in an answer to a question, lyric-finding is in development). Starting again from the beginning of the workflow, the low contrast on the manuscript, deterioration of the parchment, grease and ink smudges and thin pen strokes all make binarization difficult, so Timothy has been testing various algorithms in isolation and combination, and combining global (whole page or set) with local (part of a page) thresholds. Other experiments involve seeing if lyric extraction would help a computer distinguish background from image, and seeing if it is possible to programme binarization algorithms to run in combination with each other automatically.
The breadth of the neume nomenclature (or “neumenclature”) has also posed issues: in this one manuscript, there are 116 kinds of neumes in 12 different hands, making it difficult to train the computer to recognize shapes which differ significantly depending on the scribe. Craig Sapp suggested that the use of handwriting recognition technology might help alleviate this problem, while Timothy mentioned that they are seeking ways of optimizing the classification system to cut down on the number of neumes, at least at this stage.
In order to integrate the St. Gallen manuscript into Neon.js, DDMAL alumnus Greg Burtlet created a font with symbols for every neume glyph in the classification system. Now the challenge is to display those neumes in a logical way, finding an alternative to discreet pitches as prescribers of neume location. The project is making progress every day and when completed, will show the incomparable versatility of the Rodan system.
Wei Gao: Setting up Diva for St. Gallen
Diva.js is now one of the older components of Rodan, but also one that is continuously being improved. Diva can display high-quality images very quickly, by storing images in pyramidal .tiff formats, i.e. different resolutions simultaneously. In a method similar to that used on the Salzinnes website, Wei has layed out the St. Gallen ms. to show all of the relevant CANTUS data for each page, with CANTUS abbreviations denoting office, source and other data expanded to ease in browsing. In addition, all of the CANTUS data, including the texts of each chant, are searchable, making the St. Gallen ms. easier to browse than ever before.
Wei explained a bit about the back-end, that the expanded metadata is stored in a SolR database, and then gave a demo of the website in action, showing off fluid zooming and scrolling, thorough word searches, and elegant drop-down menu for those who wish to browse the ms. by liturgical feast.