P3-3: TOWARDS ROBUST MUSIC TRANSCRIPTION BY MEASURING CROSS-VERSION CONSISTENCY IN WESTERN CLASSICAL MUSIC
Yannik Venohr, Yiwei Ding, Christof Weiss
Subjects: Evaluation methodology ; Machine learning/artificial intelligence for music ; Music transcription and annotation ; Evaluation, datasets, and reproducibility ; Evaluation metrics ; Knowledge-driven approaches to MIR ; MIR tasks
Presented In-person
4-minute short-format presentation
Automatic Music Transcription (AMT) is a central task within MIR, enabling various subsequent applications. De- spite advancements thanks to deep learning, improving AMT remains challenging due to the scarcity of large, high-quality annotated datasets. Recognizing pitches in multi-instrument settings beyond solo piano is particularly difficult, as models struggle to generalize across domains due to dataset biases and overfitting. AMT research ap- pears to have hit a glass ceiling, where further progress is difficult to achieve and to measure. To address this, we propose cross-version consistency (CVC)---an annotation- free evaluation framework that measures a model’s tran- scription consistency across different recordings of the same musical work. We formalize this concept and sys- tematically analyze its relationship with standard evalua- tion metrics on the AMT subtask of multi-pitch estimation. Our results show that CVC is closely tied to standard evalu- ation metrics and enables model assessment using only un- labeled multi-version datasets, making it particularly valu- able in domains where annotated data is scarce but multi- version recordings are easy to obtain, such as orchestral music. Beyond this, we argue that CVC is, by design, a desirable property for transcription models and our results indicate that it can provide insights into a model’s robust- ness, i. e., its ability to generalize to out-of-domain data.