Enabling Empirical Analysis of Piano Performance Rehearsal with the Rach3 MIDI Dataset

Alia Morsi; Suhit Chiruthapudi; Silvan Peter; Ivan Pilkov; Laura Bishop; Akira Maezawa; Xavier Serra; Carlos Eduardo Cancino-Chacón

Abstract:

The study of piano rehearsals can offer interesting insights into the strategies adopted by a pianist in order to learn, interpret and eventually perform musical pieces. The analysis of rehearsal processes requires computational methods that differ from those used for piano performance, due to challenges like mistakes, repetitions of musical segments, or forward and backward skips to sections in the piece. The scarcity of publicly available rehearsal data limits the empirical understanding of these challenges. We release the Rach3 MIDI Dataset, an openly available collection of MIDI files containing more than 750 hours of recordings of piano rehearsals and corresponding MusicXML scores by four pianists (3 advanced, 1 beginner), collected over a period of more than 4 years. This dataset records the progression of pianists learning new repertoire, as well as practicing familiar pieces, all in the Western Classical tradition. We describe the rehearsal piece identification process used for automatically labeling a portion of the data in this release. Furthermore, we use the Rach3 data to highlight several challenges and future research directions pertaining to the computational analysis of piano rehearsals, specifically symbolic rehearsal-to-score alignment, rehearsal structure analysis, and automatic mistake identification.

Meta Review:

Q2 ( I am an expert on the topic of the paper.)

Disagree

Q3 ( The title and abstract reflect the content of the paper.)

Agree

Q4 (The paper discusses, cites and compares with all relevant related work.)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Strongly agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Strongly agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Strongly agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated “Strongly Agree” and “Agree” can be highlighted, but please do not penalize papers rated “Disagree” or “Strongly Disagree”. Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Disagree (Standard topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

use fingerprinting methods for matching original score and rehearsal notes.
ways to collect such dataset (the need of a big time span as well as the large range of the format types of the data signals -e.g. video/audio/symbolic domain).

Q16 ( Write ONE line (in your own words) with the main take-home message from the paper.)

A large piano rehearsal dataset with a large time-span is made publicly available and it highlights challenges and further research directions in practice analysis.

Q17 (This paper is of award-winning quality.)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Disagree

Q20 (Overall evaluation (to be completed before the discussion phase): Please first evaluate before the discussion phase. Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines.)

Strong accept

Q21 (Main review and comments for the authors (to be completed before the discussion phase). Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

The paper introduces a new dataset which includes MIDI files associated with piano rehearsals. It is claimed that it is the largest open-source dataset of its kind and an expansion plan is highlighted. The authors provide a solution for rehearsal piece identification since a part of the dataset missed the exact match of original score and rehearsal excerpt. Also, they explore how to group repeated or related musical fragments and finally, they showcase the ways such a dataset can be used in mistake identification, providing some initial results using published algorithms.

The paper is well-structured in general and provides evidence of the importance of such dataset in various research domains. Also, the supplementary material is very informative. Hence, I suggest an acceptance. However, I found Section 5 being weak, which I explain in-detail below. Also, I provide some additional comments that should be addressed in the revised version.

In Section 5, the reader would expect a preliminary computational analysis of rehearsal structure, as it is stated in Introduction. Instead, the section was mainly focused on the limitations the authors found of the technique they designed to identify fragments. It would be more informative and impactful if in Section 5 you presented some preliminary results, given the aforementioned limitations, for example by comparing a file being rehearsed by an expert versus a file being rehearsed by a beginner. If this is not possible, then you would need to adjust the text in Introduction of what exactly Section 5 presents.
For clarity, in line 222, I would suggest you present the percentage of the dataset that needed the fingerprinting step.
Line 336: need more details about how a “chord” bin is defined.
Figure 3 font size is very small. I would suggest you reduce the size of the square by including only the non-matching rows, since this part is more interesting.

Further suggestions: - Line 66: I would add GigaMIDI Dataset as well (https://arxiv.org/pdf/2502.17726)

In the background section, I would add the following study for completeness: A Novel Interface for the Graphical Analysis of Music Practice Behaviors J Sokolovskis, D Herremans, E Chew - Frontiers in Psychology, 2018
It's not very common to separate the abstract to multiple paragraphs.I would merge them to one.

Minor comments: - Lines 76, 340: missing “.”

Q22 (Final recommendation (to be completed after the discussion phase) Please give a final recommendation after the discussion phase. In the final recommendation, please do not simply average the scores of the reviewers. Note that the number of recommendation options for reviewers is different from the number of options here. We encourage you to take a stand, and preferably avoid “weak accepts” or “weak rejects” if possible.)

Accept

Q23 (Meta-review and final comments for authors (to be completed after the discussion phase))

The paper presents a large piano rehearsal dataset with a large time-span and it highlights challenges and further research directions in practice analysis. All reviewers have decided to accept the submission and we can see potential on collaborations across disciplines with researchers in music perception and cognition, exploring this data further.

Below, I’m highlighting some key points, however please see the individual reviews for details:

In Section 5, the reader would expect a preliminary computational analysis of rehearsal structure, as it is stated in Introduction, however this is not the case. Please adjust the text in introduction of what Section 5 actually presents, or edit Section 5, including some preliminary results, given the mentioned limitations.
For more clarity, provide the insights on the peculiarities of rehearsal (versus the more commonly studied performance) situations in a separate dedicated section.
Elaborate on definitions and decisions in lines 328-340.
Elaborate on annotation procedures (for fragments or mistakes).

Review 1:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 (The title and abstract reflect the content of the paper.)

Agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Strongly agree

Q5 (Please justify the previous choice (Required if “Strongly Disagree” or “Disagree” is chosen, otherwise write "n/a"))

Daily practice or rehearsal is a crucial part to any pianists success, but existing work on performance analysis tends to focus on the "final outcome" based on recordings after the pianists have rehearsed enough to play a piece smoothly.

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Strongly agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q10 (Please justify the previous choice (Required if "Strongly Disagree" or "Disagree" is chosen, otherwise write "n/a"))

Several unique and unprecedented challenges have been encountered in the authors' attempt to analyze rehearsals (Sec. 5 and 6). The authors constructed novel methods to conquer difficulties and the methods are highly worth noticing in the MIR field.

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Strongly agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Agree (Novel topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

The data preprocessing, structure analysis, and rehearsal-to-score alignment methods are currently based on MIDI, but could inspire generalization for analyzing audio recordings

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

Rehearsal piano performance is computationally under-studied and it presents unique challenges in organizing the data before downstream applications can happen

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Agree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Strong accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

Most of the manuscript is well-prepared. I am inspired by the down-to-earth manner this manuscript shows in terms of data collection, repertoire selection, and ways to conquer unique challenges. I am also convinced that the future work directions in Sec. 8 are worth exploration, as an amateur classical pianist who has been practicing over 40+ years.

The quality can be further improved in a few small places, listed below.

Line 314: what does TEC stand for?
In general, line 328-340 could benefit from further elaboration. I have difficulty understanding what exactly have been done to generate the self-similarity matrix in Fig. 4. First, is Nbin the length of the fragment divided by 100 ms? Secondly, what exactly does it mean to convolve each "chord" vector with a small kernel? Is the convolution kernel 1D or 2D? I suppose it's 1D, and adding an equation may help. Third, in Eq. (1), please add an index to indicate the range of multiplication in the product $\Pi$. In the same equation, what does the exponent pitch_i mean? I suppose it's not a pitch and the naming of the variable confuses me.

Also, in Fig. 1b, I am curious how come the music score could look so different to the instances shown below. The plot of the instances contains much more notes than the score, for example. It might be good to explain this in the caption so readers are not puzzled early during their read.

Review 2:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 (The title and abstract reflect the content of the paper.)

Strongly agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Strongly agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Strongly agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Strongly agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Agree (Novel topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

This paper addresses a creation and preliminary investigation on real piano performance rehearsal, which is not usual in published papers.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

This paper introduces a novel dataset of real piano rehearsals captured as MIDI.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Agree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Weak accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

Review Summary:

This paper presents the MIDI Dataset—an extensive collection (750+ hours) of piano rehearsals—which fills a major gap in symbolic MIR resources. It includes efforts in piece identification via symbolic fingerprinting, rehearsal structure analysis, and initial steps toward mistake detection. The paper also discusses the substantial challenges of applying performance-score alignment methods to rehearsal data.

Strengths:

Valuable and original dataset focused on real rehearsal data, not just polished performances.
Timely discussion of limitations in applying alignment techniques to noisy, fragmented rehearsal data.
Opens new research directions in performance analysis, pedagogy, and mistake modeling.

Weaknesses:

The rehearsal structure analysis lacks formal evaluation and is sensitive to parameters.
Performance-score alignment is discussed clearly, but no solution is proposed—only limitations.
Annotation procedures (for fragments or mistakes) are not fully explained.

Review 3:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 (The title and abstract reflect the content of the paper.)

Strongly agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Strongly agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Strongly agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Strongly agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Strongly agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Agree (Novel topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

The effort invested in making the work reproducible is commendable - the supplementary companion website is much appreciated. The paper provides several insights on the peculiarities of rehearsal (versus the more commonly studied performance) situations, which are likely to be useful beyond the immediate context - though a more abstracted summary of these would have been useful (this is what I was expecting under Section 6). In particular once the full dataset is published, I could see interesting collaborations across disciplines with researchers in music perception and cognition, exploring this data further.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

The paper presents (the symbolic aspect of) a new multimodal dataset that provides deep, rich empirical performance data tracking the rehearsal behaviours of a small but varied set of musicians over a long time period.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Agree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Strong accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

While several projects have attempted to implement applications that track pianist's performances, the rehearsal context with its non-linearity, non-score-prescribed repetition, and proneness to error is particularly challenging to manage. This new dataset provides opportunities for improving algorithms and software targeting this use-case.

The paper is well-structured, well-written, and easy to follow. The effort taken in making the set-up and analytical pipeline reproducible is much appreciated, and promises potential future expansion in the availability of this sort of data.

That said, while the present dataset provides very rich, deep, valuable reflection of the rehearsal behaviour of the four pianists over a long period, the fact that it is "only" four participants should be more clearly acknowledged as a limitation in the paper. Note that this is not intended as a criticism of the dataset -- there are good reasons why vastly expanding the set of participants would be unfeasible -- but this should still be spelled out in the paper. In particular, the generalisability of insights on specific rehearsal behaviours is likely limited by this small sample frame, and this should be explicitly stated.

The first paragraph of Section 4 mentions a switch in practice from recording rehearsal sessions as single takes in the first two years of the project, to recording on a more granular level later. It is stated that this decision was taken "for practical reasons" - could you expand on this slightly? It seems like convenience-to-pianist (pressing record once per rehearsal session) may have been sacrificed for convience-to-researcher (having salient segmentation of the recorded data already be performed in-situ, reducing post-processing needs). In particular, it would be important to know how much intervention was required on the part of the pianists in order to reset the recording set-up, as this could have implications regarding ecological validity; ideally one would interfere with the normal rehearsal environment and workflow as little as possible.

These concerns are all minor and could readily be addressed before camera-ready. I congratulate the authors on this interesting paper and am happy to recommend strong acceptance.

Two more minor nitpicks to finish: * there are some minor errors and formatting (capitalization) inconsistencies in your references -- please do another edit pass over all of these. Particularly, I have spotted typos in [6], [21], [31]; a problem in the URL in [20]; and some issues with format in [17]. * in following up your references I noticed that the Vienna piano dataset's webpage lists a DOI: 10.21939/4X22. Indeed, resolving this DOI (https://doi.org/10.21939/4X22) forwards to a different version of the website than the one listed in your submission: https://datasets.mdw.ac.at/datasets/dataset/98ea25fa-2468-43ff-929b-3c926e163583. Rather than URLs, please always cite the DOI when one is available; it should be assumed to provide the canonical reference.

P4-14: Enabling Empirical Analysis of Piano Performance Rehearsal with the Rach3 MIDI Dataset

Alia Morsi, Suhit Chiruthapudi, Silvan Peter, Ivan Pilkov, Laura Bishop, Akira Maezawa, Xavier Serra, Carlos Eduardo Cancino-Chacón

Presented In-person

4-minute short-format presentation

Review Summary:

Strengths:

Weaknesses: