P6-10: Optical Music Recognition of Jazz Lead Sheets
Juan Carlos Martinez-Sevilla, Francesco Foscarin, Patricia Garcia-Iasci, David Rizo, Jorge Calvo-Zaragoza, Gerhard Widmer
Subjects: Machine learning/artificial intelligence for music ; Music retrieval systems ; Evaluation, datasets, and reproducibility ; Applications ; Harmony, chords and tonality ; Open Review ; Symbolic music processing ; Optical music recognition ; MIR tasks ; Knowledge-driven approaches to MIR ; Musical features and properties ; MIR fundamentals and methodology ; Novel datasets and use cases
Presented In-person
4-minute short-format presentation
In this paper, we address the challenge of Optical Music Recognition (OMR) for handwritten jazz lead sheets, a widely used musical score type that encodes melody and chords. The task is challenging due to the presence of chords, a score component not handled by existing OMR systems, and the high variability and quality issues associated with handwritten images. Our contribution is two-fold. We present a novel dataset consisting of 293 handwritten jazz lead sheets of 163 unique pieces, amounting to 2021 total staves aligned with Humdrum **kern and MusicXML ground truth scores. We also supply synthetic score images generated from the ground truth. The second contribution is the development of an OMR model for jazz lead sheets. We discuss specific tokenisation choices related to our kind of data, and the advantages of using synthetic scores and pretrained models. We publicly release all code, data, and models.
Q2 ( I am an expert on the topic of the paper.)
Strongly agree
Q3 ( The title and abstract reflect the content of the paper.)
Strongly agree
Q4 (The paper discusses, cites and compares with all relevant related work.)
Agree
Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)
Agree
Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)
Yes
Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)
Strongly agree
Q9 (Scholarly/scientific quality: The content is scientifically correct.)
Agree
Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)
Strongly agree
Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)
Agree
Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated “Strongly Agree” and “Agree” can be highlighted, but please do not penalize papers rated “Disagree” or “Strongly Disagree”. Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)
Agree (Novel topic, task, or application)
Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)
Agree
Q15 (Please explain your assessment of reusable insights in the paper.)
The dataset and model could easily be reused in other music domains requiring chord recognition or handwritten score processing. The tokenization strategies and dataset alignment procedures are generalizable and valuable.
Q16 ( Write ONE line (in your own words) with the main take-home message from the paper.)
This paper introduces a novel dataset and model for OMR of handwritten jazz lead sheets, incorporating chord recognition and showing the benefits of synthetic data, pretraining, and symbolic-aware tokenization.
Q17 (This paper is of award-winning quality.)
No
Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)
Agree
Q20 (Overall evaluation (to be completed before the discussion phase): Please first evaluate before the discussion phase. Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines.)
Strong accept
Q21 (Main review and comments for the authors (to be completed before the discussion phase). Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)
This paper is an important step forward in the development of OMR systems that can handle handwritten lead sheets, particularly in the jazz domain. The introduction of chord recognition into the OMR pipeline is a novel and valuable contribution, especially given the diversity and inconsistency of chord symbol notations. The dataset of 293 handwritten lead sheets and the aligned symbolic formats provide a much-needed resource for the community. The exploration of different tokenization strategies and the demonstration of how synthetic data and pretraining influence performance are well-executed and informative.
Strengths: - Novel problem (handwritten jazz lead sheet OMR) - Carefully constructed dataset with region-level annotations - Thoughtful tokenization and error metric discussions - Solid experimental framework with reproducibility in mind
Suggestions for improvement: - A summary figure/table comparing datasets (e.g., CoCoPops, ChoCo, this dataset) would help emphasize uniqueness. - Some technical sections (e.g., tokenization and metric rationale) would benefit from additional diagrams or summaries. - The discussion on chord equivalence is insightful; incorporating this more explicitly into training/evaluation might be a next step.
Q22 (Final recommendation (to be completed after the discussion phase) Please give a final recommendation after the discussion phase. In the final recommendation, please do not simply average the scores of the reviewers. Note that the number of recommendation options for reviewers is different from the number of options here. We encourage you to take a stand, and preferably avoid “weak accepts” or “weak rejects” if possible.)
Strong accept
Q23 (Meta-review and final comments for authors (to be completed after the discussion phase))
This paper presents a compelling and timely contribution to the field of Optical Music Recognition (OMR) by addressing the particularly challenging domain of handwritten jazz lead sheets. It introduces a novel dataset of 293 annotated lead sheets, a thoughtful evaluation pipeline, and an adapted transformer-based model trained with synthetic pretraining and symbolic-aware tokenization.
The reviews converge on the significance of the dataset and the careful attention to tokenization and alignment strategies. While some reviewers expressed reservations due to the model architecture being previously established and the dataset's modest size, these factors do not diminish the impact of the paper’s contributions.
All reviewers acknowledged the relevance of the topic, its novelty in scope and application, and the scientific soundness of the methodology. The primary value of this paper lies not in architectural innovation, but in advancing OMR research by addressing a real-world, underexplored, and musically important use case. The dataset and methods are reusable across music genres, and the experiments are thorough and informative, especially in their treatment of chord symbol challenges.
The reviewers provided useful suggestions for enhancement:
- Clarifying methodological details (e.g., YOLO fine-tuning, decoding strategies, vocabulary scale).
- Quantitatively distinguishing between melody and chord recognition errors.
- Providing additional analysis or augmentations for image data.
- Addressing minor inconsistencies between images and ground truth representations.
These are constructive comments that can be easily addressed in the camera-ready version and do not detract from the overall contribution.
This paper sets a strong precedent for research at the intersection of OMR and jazz studies. Its dataset, methods, and findings will undoubtedly stimulate discourse, inspire follow-up work, and contribute foundational tools and benchmarks to the community.
Q2 ( I am an expert on the topic of the paper.)
Agree
Q3 (The title and abstract reflect the content of the paper.)
Agree
Q4 (The paper discusses, cites and compares with all relevant related work)
Agree
Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)
Agree
Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)
Yes
Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)
Strongly agree
Q9 (Scholarly/scientific quality: The content is scientifically correct.)
Agree
Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)
Agree
Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)
Agree
Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)
Agree (Novel topic, task, or application)
Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)
Disagree
Q15 (Please explain your assessment of reusable insights in the paper.)
I think this paper has not much reusable insights
Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)
We can now apply OMR for handwritten jazz lead sheets.
Q17 (Would you recommend this paper for an award?)
No
Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)
Disagree
Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)
Weak accept
Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)
This paper presents an Optical Music Recognition (OMR) system for handwritten jazz lead sheets — a particularly challenging domain due to the diverse handwriting styles and complex musical elements, especially chord symbols. The authors introduce a novel dataset of jazz lead sheets, providing ground truth annotations in both Humdrum **kern and MusicXML formats, and they develop a model tailored for this data.
It is clear that the authors have put significant effort into constructing the dataset. Handwritten jazz lead sheets exhibit a high degree of variation and unique characteristics, such as inconsistent chord notations, as discussed in Section 3.4. The resulting dataset includes 2,021 staff regions from 293 handwritten sheets and 2,208 regions from 326 synthetic scores. The authors employed a region identification system using Ultralytics YOLOv8, combined with manual verification to ensure data quality. For the OMR model, they adopted the previously proposed Sheet Music Transformer by Rios-Fila et al., applying only minimal modifications.
This work appears to be an important first step in addressing the specific challenges of handwritten jazz lead sheets, though there remains significant room for improvement. As the authors acknowledge, the transformer-based model may require more substantial adaptation to handle the peculiarities of this dataset effectively. Additionally, I believe the OMR performance — particularly the recognition of chord symbols — could benefit greatly from integrating techniques from the field of Optical Character Recognition (OCR), especially systems designed for complex or cursive handwriting.
Although the model used is not novel and is only lightly modified, this paper holds significance as the first to focus specifically on handwritten jazz lead sheets. The dataset it provides lays the foundation for future research in this area. For these reasons, I believe the work is worthy of presentation at the conference.
Q2 ( I am an expert on the topic of the paper.)
Agree
Q3 (The title and abstract reflect the content of the paper.)
Strongly agree
Q4 (The paper discusses, cites and compares with all relevant related work)
Agree
Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)
Agree
Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)
Yes
Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)
Strongly agree
Q9 (Scholarly/scientific quality: The content is scientifically correct.)
Agree
Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)
Agree
Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)
Agree
Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)
Agree (Novel topic, task, or application)
Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)
Agree
Q15 (Please explain your assessment of reusable insights in the paper.)
A novel dataset for OMR that is based on modern music.
Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)
OMR can be applied also to modern written music.
Q17 (Would you recommend this paper for an award?)
No
Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)
Agree
Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)
Weak accept
Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)
This paper presents the first dedicated OMR dataset of handwritten jazz lead sheets, addressing a relevant and previously underexplored area within the OMR community. Lead sheet handwriting is still common among jazz players and students who train their skills.
In general, the paper is adequatly written and easy to follow. The work is methodologically thorough, the preprocessing and evaluation design are in their way novel, considering existing OMR tasks. The chord symbol alignment and tokenization approaches are also interesting. On the other hand, the used model has already been previously proposed by (possibly different) authors. However, the paper includes two significant contributions (dataset and methodology), which I deem satisfactory, considering the page limit of the conference. Overall, the paper represents a foundational step toward practical OMR, including currently used hand written scores, and including a new genre (in comparison to traditional OMR tasks).
Q2 ( I am an expert on the topic of the paper.)
Strongly agree
Q3 (The title and abstract reflect the content of the paper.)
Strongly agree
Q4 (The paper discusses, cites and compares with all relevant related work)
Strongly agree
Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)
Strongly agree
Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)
Yes
Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)
Strongly agree
Q9 (Scholarly/scientific quality: The content is scientifically correct.)
Strongly agree
Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)
Agree
Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)
Strongly agree
Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)
Disagree (Standard topic, task, or application)
Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)
Agree
Q15 (Please explain your assessment of reusable insights in the paper.)
The ablation evaluations, where authors trained and compared various variants of the model, are insightful, even if not entirely surprising. The evaluations can serve as informative baselines for other researchers working in this domain.
Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)
Fine-tuned Sheet Music Transformer on a new dataset of handwritten jazz lead sheets, with a new Humdrum tokenization scheme and ablations on data and tokenization.
Q17 (Would you recommend this paper for an award?)
No
Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)
Disagree
Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)
Strong accept
Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)
This is a good paper. It is written clearly and explains everything from the background to the details of the method in a way that is both approachable and interesting. While it does not present a novel model architecture, it gives a good example of collecting a dataset for a new OMR variant (handwritten jazz lead sheets) and fine-tuning an existing model on it.
Strengths: * New open dataset for a challenging OMR variant * Important details (e.g. standardization of chord spellings) are taken care of in the dataset cleaning process and discussed thoroughly * Explored alternative Humdrum tokenization schemes * Comprehensive evaluations, both quantitative and qualitative * Clear explanations with graphical examples * Code (including new Humdrum tokenizers) and model weights are released
Weaknesses: * Dataset is rather small * The model is not novel
Additional comments: * Line 185: Unclear, did authors fine tune YOLOv8 for region identification? Is this work published anywhere, or can authors add some detail on this? * Line 313: This described greedy decoding, is this really the sampling method used (as opposed to beam search)? * Line 339: Vocab size of 1762 is not “very large” compared to popular transformer models (e.g. BERT has 30,522, GPT2 has 50,257). Anyway, since the music sequences here are very short, sequence length is less of a problem. * Line 525: Data augmentation can (and should) also be applied to the scanned lead sheets (e.g. adding noise, rotating, cropping). * Figure 3: Why does the example’s ground truth include clef, time signature and key signature, while the image does not? * Section 5.3: Considering the qualitative observation that most problems are in chord symbols, I’d be interested in a quantitative evaluation of melody errors vs chord errors. * Line 493: How can the model know the key of the piece, when it only receives the region in Figure 3, which doesn’t include the key signature? (Or is the image cropped?) This might point at a discrepancy in the dataset between the image and the ground truth.