Encoding in human centered machine learning workflows: case study on mensural ligature recognition

Authors: Rizo, David / Calvo-Zaragoza, Jorge / Martínez-Sevilla, Juan Carlos / Madueño, Antonio / García-Iasci, Patricia / Delgado-Sánchez, Teresa

Date: Friday, 8 September 2023, 11:15am to 12:45pm

Location: Main Campus, L 2.202 <campus:measure>

Abstract

This work presents a case study on the human-centered design of machine learning workflows for Optical Music Recognition (OMR). The study specifically targets mensural ligature recognition, which presents unique challenges for OMR due to its complex notation and variations in shapes. The case study is conducted within the context of a larger project that aims to encode all books of mensural notation held by the National Library of Spain (BNE)¹ using the online version (Rizo et al. 2023) of the Music Recognition, Encoding, and Transcription Tool (MuRET) (Rizo et al. 2018).

The study employs an incremental approach that builds on a seed model created from an existing training collection (Parada-Cabaleiro et al. 2019). The output of the OMR is post-edited to obtain a high-quality encoding, and the corrected sources are used to create new models that improve OMR accuracy and reduce post-editing time.

In (Calvo-Zaragoza and Rizo, 2018), the differentiation between agnostic and semantic content was introduced to reduce the variability of symbols the recognition systems had to detect, and to allow automatic transcriptions to be corrected by non expert editors who should only be able to identify graphical symbols. This approach has been extensively used in the work that has enabled to build a big enough corpus to train high quality OMR deep learning models.

In the transcription process assisted by OMR, the most difficult challenge has been the transcription of ligatures in which this paper will focus. Initially, the models were unable to learn how to recognize ligatures because of their scarce occurrence, so they had to be represented graphically as a generic “ligature” symbol, and semantically encoded in **mens (Rizo et al. 2019). This manual process significantly slowed down the whole post-editing process, making it more time-consuming and labor-intensive.

A synthetic corpus was built to experiment with different agnostic encodings in order to select those ones with the best trade off between training corpus size and performance. Among those encodings, experiments with real end users were carried out to find out the more user-friendly agnostic representation in terms of user experience (UX) for the process of fixing classification errors.

Once a considerable number of manually encoded ligatures were obtained in all the real corpora in MuRET using the generic ligature symbol for the agnostic representation, and semantically encoded using **mens, they were converted to the new agnostic encoding using a state machine transducer to obtain the final representation, from which new models were created that correctly recognized the ligature in the sources.

Thanks to this approach, the recognition and post-editing effort were reduced by a factor of 10, allowing the end user to obtain a complete and correct encoding of a standard book page in under one minute per page.

Bibliography

Calvo-Zaragoza, J. and David Rizo. 2018. “End-to-End Neural Optical Music Recognition of Monophonic Scores” Applied Sciences 8, no. 4: 606. https://doi.org/10.3390/app8040606

Parada-Cabaleiro, E., Batliner, A., & Schuller, B. W. (2019). “A Diplomatic Edition of Il Lauro Secco: Ground Truth for OMR of White Mensural Notation”. In A. Flexer, G. Peeters, J. Urbano, & A. Volk (Eds.), Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, November 4-8, 2019, pp. 557–564.

Rizo, D., Calvo-Zaragoza, J., & Iñesta, J. M. (2018). “MuRET: a music recognition, encoding, and transcription tool”. In K. R. Page (Ed.), Proceedings of the 5th International Conference on Digital Libraries for Musicology, DLfM 2018, Paris, France, September 28, 2018, pp. 52–56. ACM. https://doi.org/10.1145/3273024.3273029

Rizo Valero, D., Pascual León, N., & Sapp, C. S. (2019). “White Mensural Manual Encoding: from Humdrum to MEI”, Cuadernos De Investigación Musical, (6), 373–393. https://doi.org/10.18239/invesmusic.v0i6.1953

Rizo, D., Calvo-Zaragoza, J., Martínez-Sevilla J.C., Roselló, A., and Fuentes-Martínez, E. (2023). “Design of a music recognition, encoding, and transcription online tool”, 16th International Symposium on Computer Music Multidisciplinary Research, Tokyo, 2023 (accepted)

Notes

https://grfia.dlsi.ua.es/polifonia/ (accessed April 1st, 2023) ↩

Contribution Type

Keywords