Lessons from the Classroom: MEI for Data Scientists

Authors: Freedman, Richard / Russo-Batterham, Daniel

Date: Wednesday, 6 September 2023, 2:15pm to 3:45pm

Location: Main Campus, L 1.202 <campus:note>


Many Data Science and Computer Science students today are familiar with JSON, and may even have worked with APIs to extract data from the web. Ask about XML, however, let alone TEI or MEI, and you are often met with quizzical looks. Yet XML files contain much information that can be productively analyzed with modern Data Science tools, so training students to leverage these materials is a worthwhile endeavor. Our work on Citations: The Renaissance Imitation Mass (CRIM; crimproject.org) has demonstrated the value of using MEI as the basis for representing a digital score (with the help of Verovio) and performing sophisticated queries over large corpora. Encodings can thus be created with these dual uses in mind.

In this presentation, we reflect on our experience teaching XML concepts in the Data Science classroom with—for largely practical reasons—Python tools already familiar to students, and talk through an indicative analysis. A novel approach we used to explain the hierarchical structure of XML was through colorful and interactive network visualizations. This helped students to quickly grasp not just the tree-like structure of XML, but the scale and complexity of an encoded text, as well as the diversity of the elements used and where they appear, with some elements frequently “leaf” nodes and others structuring much broader sections.

Once armed with this understanding, they could begin to analyze the attributes contained within these elements in ways that reveal new insights, from the distributions of features, to patterns of similarity; with one caveat: all their wrangling, analysis, machine learning and visualization tools expect data in a tabular format.

Students were then asked to extract select information into tables with BeautifulSoup, while thinking about the implications of representing the same knowledge in different ways, in this case through varied data structures. What was lost in translation when going from one structure to another? How did the way the XML was originally created affect this process and the final results of analysis? How does hand-coded XML differ from machine-generated? What would they do differently when creating their own XML if they knew it would be used for analysis? What, if anything, can we learn from these questions about the varied means of encoding knowledge and culture more broadly, from text (ink), to music (sound), to art (paint)?

In answering these questions, we will retrace some of the key steps of our pedagogy, moving between macro- and micro-level views of MEI documents and the encoding practices they embody. We will begin by viewing the hierarchy of an XML document as a network of nodes and edges—a representation that gives students a vivid understanding of both the complexity and the logical structure that otherwise appears to them as just another text document in an editor. From here we will zoom in on some key nodes in the network, showing how students began to traverse the XML files intelligently in order to answer various kinds of queries about these texts, the musical concepts they encode, and how the structured XML encoding compares with scores in staff notation.

One productive point of inquiry centers on accidentals—chromatic inflections that happen from time in European music as pieces move between different diatonic pitch spaces. In Early Music in particular, the original sources were often less than explicit. Composers, scribes, and printers often assumed that competent singers and instrumentalists would simply know to add a sharp, flat, or natural before some given note depending on its melodic and contrapuntal context. In modern transcriptions these unwritten accidentals (typically called musica ficta ) are often made explicit. Graphical editions do so by putting these symbols above the staff, or by surrounding them with parenthesis. But in the case of MEI encodings we can be still more precise, since <supplied> elements can convey all sorts of detail about the meaning, intention, and ultimate responsibility for such work. Comparing the notated and supplied accidentals in a given piece can tell us something about the explicitness of that particular edition; doing so across a corpus can reveal biases of editorial practice. And these kinds of lessons in turn, prompted interesting reflections on the part of students about the biases and values implicit in all editorial work, and the technologies that represent the perishable art of music.

All materials will be available as public Jupyter notebooks for reuse in classrooms and laboratories.

Contribution Type