P1-4: Quantifying regularity in music structure analysis

Brian McFee

Subjects: Evaluation methodology ; MIR fundamentals and methodology ; Structure, segmentation, and form ; Evaluation, datasets, and reproducibility ; Evaluation metrics ; Open Review ; Musical features and properties

Presented Virtually

4-minute short-format presentation

Abstract:

This article describes objective measures of segment regularity for use in evaluating musical structure annotations. The core idea derives from identifying simple ratio relationships between segment durations (e.g.,, 2:1 or 3:4), and can be implemented in both musical time (beats) or absolute time (seconds). Extensions are proposed to further quantify regularity within labeled segment groups, across hierarchical levels, and evaluate balance or uniformity of segment durations. The efficacy of the proposed methods is demonstrated through an empirical study of several standard datasets for music structure analysis.

The results indicate: 1) under reasonable assumptions of tempo stability, regularity can be reliably measured in absolute time, 2) most existing datasets exhibit regularity, 3) regularity interacts meaningfully with segment labeling, 4) regularity and balance are distinct concepts, and 5) multi-level segmentations exhibit cross-level regularity.

Meta Review:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 ( The title and abstract reflect the content of the paper.)

Agree

Q4 (The paper discusses, cites and compares with all relevant related work.)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Strongly agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Strongly agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated “Strongly Agree” and “Agree” can be highlighted, but please do not penalize papers rated “Disagree” or “Strongly Disagree”. Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Disagree (Standard topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Disagree

Q15 (Please explain your assessment of reusable insights in the paper.)

The topic of the paper is quite specific.

Q16 ( Write ONE line (in your own words) with the main take-home message from the paper.)

This paper proposes a metric of regularity of structural segments' duration, on musical and absolute time.

Q17 (This paper is of award-winning quality.)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Disagree

Q20 (Overall evaluation (to be completed before the discussion phase): Please first evaluate before the discussion phase. Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines.)

Weak reject

Q21 (Main review and comments for the authors (to be completed before the discussion phase). Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

This paper proposes a metric of regularity of structural segments' duration, on musical and absolute time.

The paper is clear and well written. The proposed metrics, very simple, are evaluated on several well-known datasets (Beatles, HarmonixSet, Jazz Structure Dataset, Jazz Audio-Aligned Harmony, Real-World Computing, SALAMI). This paper proposes interesting results on an under-explored topic. The metrics can help quantification of the deviation from what is expected when computing structure segmentation. The computation of regularity in absolute time is more accurate than relying on beat and downbeat estimation. However the application perspectives of the results seem rather low.

Figure 1: the caption should be improved to clarify the figure. When Fig. 1 is cited, the reader does not know what $d_1$, $d_2$ or $\ro$ means. What are the "patterned regions" if they are not marked by dashed lines? It is unclear what "multiples of this unit" refer to.

Q22 (Final recommendation (to be completed after the discussion phase) Please give a final recommendation after the discussion phase. In the final recommendation, please do not simply average the scores of the reviewers. Note that the number of recommendation options for reviewers is different from the number of options here. We encourage you to take a stand, and preferably avoid “weak accepts” or “weak rejects” if possible.)

Accept

Q23 (Meta-review and final comments for authors (to be completed after the discussion phase))

This paper is interesting and fills a gap in the evaluation of music structure. However some weaknesses were raised by the reviewers, that should be taken into consideration for the camera-ready version. In particular, the authors should clearly define the regularity in the beginning of the methodology (metric) section to clarify the area where the paper contributes to.

Review 1:

Q2 ( I am an expert on the topic of the paper.)

Strongly disagree

Q3 (The title and abstract reflect the content of the paper.)

Strongly agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Strongly agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Strongly agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Agree (Novel topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Strongly agree

Q15 (Please explain your assessment of reusable insights in the paper.)

The first paragraph of Section 6.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

Regularity in music structure can be quantified and is sometimes, if not often, different from balance.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Agree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Weak accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

The paper reads well and makes good use of the allotted space -- the scope is just right.

Having a Limitations section tilts the balance towards acceptance even if the detail is in an area of research I am not familiar with.

Review 2:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 (The title and abstract reflect the content of the paper.)

Agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Disagree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Strongly agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q10 (Please justify the previous choice (Required if "Strongly Disagree" or "Disagree" is chosen, otherwise write "n/a"))

The definition of the "regularity" under the music structure analysis context should be clearly defined in or before section 3. Please refer to the detailed comments in the main review section.

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Strongly agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Disagree (Standard topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Disagree

Q15 (Please explain your assessment of reusable insights in the paper.)

The regularity measures proposed in this paper offer a useful approach for quantifying duration-based consistency and patterns in existing music structure annotations. However, their applicability is limited when it comes to more thoughtful and content-aware evaluation of music structure analysis models or datasets. Please refer to the detailed comments in the main review section.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

This paper introduces several methods for quantifying regularity in existing music structure annotations and presents empirical findings based on these measurements.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Disagree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Weak reject

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

This paper introduces several methods, along with their basic properties and extensions, for quantifying regularity in existing music structure annotations. The evaluation on standard music structure datasets reveals findings such as the relationship between beat-based and time-based regularity, as well as regularity patterns across multi-level segmentations.

Before going into detailed comments, I would like to note that my overall assessment leans toward a borderline recommendation, meaning I would support either a weak accept or a weak reject, depending on how the final decision balances strengths and limitations.

The paper contains several strengths.

First, the paper is well-motivated and reflects a fair degree of novelty. It addresses an under-explored area within music structure analysis evaluation: the role of regularity or duration-wise consistency. By proposing formal metrics to quantify regularity, the work brings attention to an important but often overlooked aspect of music structure analysis task. Assessing regularity in existing music structure annotations can help researchers better understand the duration-wise consistency and coherence of datasets, and may also serve as a metric for guiding segmentation models toward avoiding overly irregular segment durations.

Second, the experimental design and analysis are well-executed and solid. The authors present detailed and insightful statistical evaluations across a range of widely used music structure datasets. The comparisons span across different framing choices (beat-based vs. time-based) and factors (tempo stability). These empirical results highlight the potential value of the proposed metrics for evaluating and interpreting the quality and consistency of structural annotations in music datasets.

Nonetheless, there are two major drawbacks to consider.

First, the paper's writing lacks proper structure in certain areas. For instance, although the authors mention that "most prior work stops short of providing a formal definition of regularity," they also fail to propose a formal definition themselves, instead transitioning directly into the section on "temporal divisibility" without clarification. There is no explicit definition of "regularity" in the context of music structure analysis, with the only potential example-based definition appearing in Figure 1. Additionally, it would be beneficial to discuss the applications and benefits of the proposed regularity metrics in both dataset-level quality assessment and in improving the music segmentation algorithms. While a limitations section is included, an "applications" section might be more appropriate and should precede it.

Second, a critical aspect of the paper lies in the broader applicability of the proposed metrics, such as their potential to improve or evaluate music segmentation models. The paper demonstrates that the proposed metrics can reflect the quality of existing music structure annotations, but such evaluation of "regularity" in music structure analysis is somewhat limited in procedure, and is not correlated to the musical content. It is important to note that quality control for music structure analysis datasets is often conducted through detailed manual procedures, where annotators refine each sample to provide accurate structural labels. As such, it may not be necessary to develop new metrics to "re-evaluate" whether an existing dataset provides quality structure labels if the data collection process is already reliable. Moreover, these metrics have limitations as they only measure the consistency of duration across annotated segments, such as ensuring there is no irregular number of bars. They do not assess whether the annotations accurately reflect the correct musical phrases or boundaries. As a result, a high "regularity" score, as proposed in this paper, does not necessarily indicate the true quality of the data from a music structure perspective. Furthermore, in the context of evaluating or guiding music segmentation models, the limitation of these metrics becomes more apparent, as they serve primarily as a duration regularizer during evaluation, but can hardly be utilized in the training process (due to the lack of differentiability). Given these constraints, the value of the proposed metrics appears limited, as they offer a partial, non-content-based assessment of music structure data and contribute less to the improvement of music segmentation models.

In conclusion, this paper introduces a novel perspective on evaluating regularity metrics within the music structure analysis task. While the idea and the proposed metrics are interesting and reasonable, their practical value appears limited due to the scope and applicability of the proposed metrics.

That said, my expectations may extend beyond the intended goals of the paper ("Our goal in this work is not to propose new algorithms for structure analysis, but rather to gain insights about how regularity manifests in existing structural annotations"). Therefore, I remain open to recommending either a weak reject or a weak accept, depending on how the contribution is clarified upon the discussion phase.

Review 3:

Q2 ( I am an expert on the topic of the paper.)

Strongly agree

Q3 (The title and abstract reflect the content of the paper.)

Strongly agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Strongly agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Strongly agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Strongly agree

Q10 (Please justify the previous choice (Required if "Strongly Disagree" or "Disagree" is chosen, otherwise write "n/a"))

The content of the paper is scientifically sound, and I find in particular that the authors took a great amount of time to try to cover all aspects and many caveats of their metrics and of the regularity evaluation as a whole. The paper not only fills a gap in the literature but also provides a robust framework for future research and practical applications.

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Disagree (Standard topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

The metrics are detailed and should be easy to implement from scratch. The paper's reusability could still be enhanced by releasing the code and potentially creating a toolbox or integrating their metrics in standard toolboxes (e.g., mir_eval). These metrics might be useful in many new algorithmic designs since it evaluates a previously overlooked principle.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

This paper introduces new metrics for music structure analysis, dedicated to the analysis of the regularity principle.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Disagree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Strong accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

This paper introduces a novel framework for evaluating the regularity of segmentations in music structure analysis. The authors propose a set of quantitative metrics that defines regularity and balance by evaluating the ratios between segment durations. These metrics are designed to operate in both musical time (beats) and absolute time (seconds), making them versatile for various music contexts, and notably when beat estimation is not reliable (which was not possible before).

I found the paper to be very well-written and enjoyable to read. This paper fills a hole in the current evaluation process of music structure analysis and will help future researchers integrate the regularity principle with quantitative arguments, hence improving the rigor of this principle.

I have minor comments on the paper that I will detail hereafter: -- The authors do not mention the MIREX10 set of annotations for RWC Pop (which, unfortunately, does not contain segment labels), while it was motivated to enhance the regularity of the original (AIST) set of annotations [1]. I would have liked to see if this original motivation turned out to be quantitatively observable in practice. -- Line 243: it is said that "in general," the balance is lower or equal to the regularity. It seems to me that this "generality" is, in fact, always true. Maybe the authors meant a strict inequality? Otherwise, I would suggest rephrasing the word "in general" to be more direct. It would also clarify the discussion about the properties of the balance metric that follows this line.

[1] Bimbot, F., Sargent, G., Deruty, E., Guichaoua, C., & Vincent, E. (2014, January). Semiotic description of music structure: An introduction to the Quaero/Metiss structural annotations. In AES 53rd International Conference on Semantic Audio (pp. P1-1).