Phylo-Analysis of Folk Traditions: A Methodology for the Hierarchical Musical Similarity Analysis

Hilda Romero-Velo; Gilberto Bernardes; Susana Ladra; José R. Paramá; Fernando Silva

Abstract:

This study introduces and evaluates a new methodology for cross-cultural ethnomusicological analysis of symbolic music. We investigate music similarity in popular traditions rooted in oral transmission by identifying shared patterns at scale across multiple hierarchies. The novelty of our approach lies in expanding musical similarity phylo-analysis-typically adopting alignment metrics that compare entire scores-to structurally aware phrases and macro-structure (i.e., form) alignment. Additionally, we explore patterns derived from multiple representations (chromatic interval, diatonic interval, rhythmic ratios, and a combination of them) to facilitate the exploration of stylistic affinities across musical genres and traditions. Our method is tested on a new dataset of 600 Galician and Irish popular music scores, which includes expert annotations for 21 genres (four shared between the two traditions) and detailed phrase information, all made available as open-access data. We use the genre separation ratio to examine how alignment strategies capture stylistic structure, providing insights that support musicological exploration across genres and traditions. The resulting phylogenetic trees and distance matrices reveal relationships among traditions, genres, and scores, facilitating the exploration of cross-cultural influences and enabling the identification of shared patterns at multiple hierarchies.

Meta Review:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 ( The title and abstract reflect the content of the paper.)

Strongly agree

Q4 (The paper discusses, cites and compares with all relevant related work.)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Strongly agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Strongly agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Disagree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated “Strongly Agree” and “Agree” can be highlighted, but please do not penalize papers rated “Disagree” or “Strongly Disagree”. Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Agree (Novel topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

The core idea of this paper is to investigate music similarity by identifying shared patterns at multiple representations (chromatic interval, diatonic interval, rhythmic ratios) across multiple hierarchies (phrase, form, global).

Q16 ( Write ONE line (in your own words) with the main take-home message from the paper.)

The similarity metrics applied to phrase and macro structures derived from chromatic pitch and duration ratios are more effective in recognizing genres.

Q17 (This paper is of award-winning quality.)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Agree

Q20 (Overall evaluation (to be completed before the discussion phase): Please first evaluate before the discussion phase. Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines.)

Weak accept

Q21 (Main review and comments for the authors (to be completed before the discussion phase). Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

This paper develops a methodology for the cross-cultural ethnomusicological analysis of symbolic music. The core idea is to investigate music similarity by identifying shared patterns at multiple representations (chromatic interval, diatonic interval, rhythmic ratios) across multiple hierarchies (phrase, form, global). The methodology was tested on a new dataset comprising 600 Galician and Irish popular music scores, which features expert annotations for 21 genres (including four shared between the two traditions) and detailed phrase information. The findings indicate that similarity metrics applied to phrase and macro structures derived from chromatic pitch and duration ratios are more effective in recognizing genres. The authors position this methodology as an analytical tool for ethnomusicologists to explore large datasets, facilitating the study of cross-cultural influences and the identification of similar scores across genres and traditions.

The main contributions of the paper include: 1. Novel methodology: The paper presents a novel approach by applying phylogenetic analysis to incorporate the hierarchical structure of music, including phrases and musical form. This paper also tests the classification/similarity analysis method on several musical features (diatonic interval, chromatic interval, rhythmic ratio, and their combinations), which provides an overview of how different musical elements contribute to similarity and genre differentiation

New open dataset: An open-access of a new dataset of 600 Galician and Irish folk music scores with expert phrase annotations is a valuable contribution to the research community for further research.
Systematic tool for quantitative analysis: This paper adapts methods from bioinformatics, particularly phylogenetic techniques and alignment algorithms, to ethnomusicological analysis. The use of the Genre Separation Ratio (GSR) metric provides a quantitative way to evaluate and compare the effectiveness of different features and similarity methods in separating genres.

However, there are several aspects of this paper that can be further improved: 1. It is not entirely clear from the description how the results of the QT Clustering of phrases are converted into the final alignment value used for the shared phrases similarity method. Further detail on the rationale and computation of this "alignment value" derived from the clustering results would enhance clarity. 2. The procedure or standard by which an ethnomusicologist selects target genres for deeper analysis and how the tree of scores is specifically computed for these preferred genres requires more explicit explanation. Detailing the interface between the expert's analytical decisions and the automated tool's functions would be beneficial. 3. The choice of the threshold for QT Clustering seems empirically derived from analyzing distance distributions. A discussion or justification for this specific threshold, or an exploration of the sensitivity of the results to different thresholds, might strengthen the methodology.

This paper presents an innovative methodology for applying phylogenetic analysis to folk music incorporating musical structure hierarchy, and also contribute to the creation of a new, open-access dataset for Galician and Irish popular music. The proposed method is evaluated using the GSR metric, particularly for the comparison of different rhythmic and chromatic-rhythmic features. Despite some areas requiring further clarification regarding specific computational steps (phrase clustering to similarity) and the user workflow (genre selection for score trees), the fundamental approach is sound and the results are compelling. The paper has demonstrated that this methodology can serve as a valuable tool for ethnomusicologists studying cross-cultural relationships in folk music.

Q22 (Final recommendation (to be completed after the discussion phase) Please give a final recommendation after the discussion phase. In the final recommendation, please do not simply average the scores of the reviewers. Note that the number of recommendation options for reviewers is different from the number of options here. We encourage you to take a stand, and preferably avoid “weak accepts” or “weak rejects” if possible.)

Accept

Q23 (Meta-review and final comments for authors (to be completed after the discussion phase))

Paper 212 has a mixed set of reviews: 1 strong accept, 2 weak accepts, and 1 weak reject. Below is a synthesis of the main points raised by the reviewers:

Strengths: • Novel methodological contribution (all reviewers): This paper presents a novel integration of cultural evolutionary analysis with MIR techniques. The methodological innovation is viewed as a valuable contribution to the field. • New open dataset (all reviewers): The introduction of a new dataset was positively received by all reviewers, who noted its potential utility for future research. • Timely and underexplored topic (Reviewers #2 and #3): This paper addresses an important and relatively underexplored area in computational ethnomusicology through computational methods.

Weaknesses: • Insufficient methodological detail (Meta-review, Reviewer #1): The paper lacks justification for key methodological choices, such as the selection of the corpus and the choice of data representations (e.g., melodic and rhythmic patterns). Clarifying these decisions would strengthen the reproducibility and interpretability of the study. • Limited depth in analysis and interpretation (Reviewers #1 and #3): Reviewers request a more thorough discussion of the analytical results, including: - How features from different musical genres manifest in both global and phrase-level similarity. - How the outcomes of the phylogenetic analysis relate to known characteristics of musical genres. - A clearer explanation of the correlation measures and the genre separation ratio. - Additional analyses to support statistical significance of the findings.

Overall Assessment: The reviewers generally view the paper as a valuable contribution, particularly in terms of its dataset and the promise of the proposed analytical framework. While there are areas for improvement, especially in terms of methodological transparency and interpretive depth, the novelty and potential impact of the work support an accept recommendation.

Review 1:

Q2 ( I am an expert on the topic of the paper.)

Strongly agree

Q3 (The title and abstract reflect the content of the paper.)

Disagree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Strongly agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Agree (Novel topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

The Quality Threshold clustering algorithm is an interesting method for clustering similar melodies.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

A corpus of Irish and Galician folk song melodies, represented as strings of pitch and duration intervals, is subjected to phyologenetic analysis with different similarity metrics, among which two are based on annotated phrases in the melodies.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Disagree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Weak reject

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

The paper presents some interesting methods to analyze relationships between folk songs. The authors plan to provide the corpus, along with the analysis code and a Docker image, with is an excellent initiative towards reproducibility. The paper is overall well-written, but is not always easy to follow.

First of all, the argument for the selection of the corpus is somewhat weak. Even if Irish and Galician folk traditions have influenced each other, the proposed dataset does not provide much beyond some shared genres to corroborate claims of interrelated musical traditions. If the dataset was enriched with annotations of melody equivalents, there would be a testable hypothesis. Being able to identify genres based on melodic similarity is perhaps not too surprising, as dance music entails different rhythmic structures.

The authors compare a number of different music representations - diatonic and chromatic intervals, as well as rhythm intervals, and combinations thereof. While these are some good starting points, why these representations? Why report two different types of pitch intervals, and not explore representations such as pitch classes?

The authors also compare a number of different similarity metrics - one based on global alignment, one based on phrases shared according to the Quality Threshold clustering algorithm, one based on sequences of such phrases, and weighed combinations of the latter two. Figure 3 presents the genre classification success of the different pitch representations and similarity measures. This is an interesting overview. What it tells me is that in some genres, such as alálás, genre classification works well based on global alignment of melodies, while in others, such as Waltz, it works poorly. In my opinion, it would be much more interesting to zoom in here and ask why some genres are easier to identify, and why the rhythmic feature is successful. Especially given the success of the rhythmic feature, it seems that the easy to distinguish genres have distinctive rhythmic patterns.

The authors do not go into these kinds of analyses, however, but seem to take these results as a justification for dropping pitch representations and global alignment from their further analyses. As for the former: it seems strange to expect that pitch interval sequences would be so uniform within a genre that they can be used as an identifying feature. As for the latter: why is global alignment not analyzed further, given that it achieves better classification success than the shared phrases feature?

The method to derive clustering of phrases - Quality Threshold clustering - is interesting, even though it is doubtful whether the clustering would correspond to human judgements, as the shown phrases in Figure 8 reveal. It also seems that phrase correspondences between songs matter less than the sequences of phrases within a song. This is interesting, but perhaps also not surprising: specific genres may have typical form structures. It would be welcome to see form examples here. While QT clustering is one way of generating labels for repeating phrases within a song, it stands to question whether other metrics, which compare phrases of songs in isolation, might not be just as successful, or even more successful, at generating a form representation.

Furthermore, phylogenetic analysis is performed. Generally, this is an interesting way to study folk songs. Given the corpus, of which -- at least as far as the authors inform us -- there is no information about the time period in which the folk songs were recorded, it is not logical to represent relationships between genres, which may concur in time, in a phylogenetic tree. If one took Figure 4 at face value, this would meen that pasacorredoiras and mazurcas are the "oldest" genres, from which others are derived. This seems counter-intuitive, to say the least.

In summary, the paper introduces many interesting ways of analyzing music, but it offers little in the way of musicological insights. While the methods are reproducible, they also rely on the presence of annotated patterns, which means they cannot be transferred onto an arbitrary dataset of symbolic music. That being said: there are some well-studied folk song datasets which do have such annotations, and in which there are also annotations of finer-grained musical relationships (e.g., tune families in Dutch folk songs). I would recommend that the authors verify their methods on such a dataset before applying them to a dataset of which we know less about melody relationships. Arguably, genre labels are too coarse to study musical relationships.

Remarks 107 and elsewhere: there are references to "score" here, while only the melodies of the folk songs are considered, and not the accompaniment. Figure 1: this figure contains many unnecessary or even unclear graphical elements: why are the circles with "CR" next to the arrows? Why are some arrows dotted, and others strong? What does the symbol right of the different similarity measures mean? "D", "R" etc. also are not explained in the caption. 132 ff.: the representation as pitch intervals and duration intervals is not per se novel. Perhaps references to other studies could also be given here, as music representation for similarity metrics has been widely studied. Representing rests is an interesting choice, as these are often ignored when using inter-onset intervals. Whether they add any information is an open question. 287 her/him -> them 252 ff. Combined smilarity - why was only form and shared phrases investigated here, seeing as the global similarity gave better results than the shared phrases? Figure 4 and 5 mention "the rhythmic feature". This seems to indicate that after the analysis of Figure 3, the rhythmic representation was chosen and the others were discarded. This is not stated clearly in text. 326 and elsewhere: form information is referred to as "hierarchical relationship" of music, while it is about sequences of melodic phrases. I find the term "hierarchical" misleading, here and elsewhere. 340 ff. This section is presented as a case study, while it is actually a qualitative evaluation of the model whose development was described in the previous sections. I would place question marks after any conclusions drawn from the resulting similarity metrics: perhaps they simply fail to capture essential information (meter!) to group genres such as waltz and valses together. 382 He/She -> They 399 ff. the text states here valses adopt binary meter, while earlier, valses were related to waltz (triple meter) Figure 8 shows a cluster of phrases, but I would doubt whether humans would perceive them as similar.

Typos 90 case ~~of~~ study 340 study case -> case study

EDIT: most reviewers argue in favour of accepting this paper, and I certainly share their enthusiasm for the methodology and open science. To address some of my criticisms, for the camera-ready version I would like to request that the authors add more details on the corpus, perhaps with exemplary melodies from one or two genres, which might help clarify the reasoning for choosing this task and the given methodology. To make space for this, perhaps Figure 1 could be left out. As mentioned above, I do not think it adds much information.

Review 2:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 (The title and abstract reflect the content of the paper.)

Strongly agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Strongly agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Strongly agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Strongly agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Strongly agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Strongly agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Agree (Novel topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

The methodology is applicable to many other traditions.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

Plausible structure of folk music genres can be found using phylogenetic trees, with careful feature extraction and selection over rhythmic and melodic phrases.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Agree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Strong accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

The paper studies the relationship between genres of folksong from two traditions, Irish and Galician, based on a combination of rhythmic and melodic patterns using an unrooted phylogenetic tree. The genre tree is built in two stages: first, a tree is built over melodies, without including genre information, and in the second stage the resulting distances between melodies of the same genre are aggregated and used to build the genre tree itself. Both trees are done with neighbor-joining. The result is convincing, generally grouping genres of the individual traditions into separate high-level clades, but with individual genres — waltzes and alalas — ending up in the „other“ clade. The data is made publicly available, including the phrase annotations, as well as all code.

The paper is well-crafted, aware of related work, understands — as far as I can tell, as I am not familiar with the Galician tradition — the musical materials to the level required to make its point. I specifically appreciate the use of genre as an independent proxy for assessing the quality of different input representations properly, using the genre separation metric. (Though I would rather suggest phylogenetic signal as a possible alternative metric down the road.) It is an interesting, well-balanced mix between (ethno)musicological motivation, material, and appropriate computational methods. It puts a number of well-selected pieces together to work with folk music analysis, which is a field that has not been studied computationally as much — there is a lot of opportunity with these materials. Good to also see at least some overlap with cultural evolution, which is a field that could provide a principled link between MIR methods and models and conclusions about music and musical life.

And, importantly, the paper is 1) not overclaiming, 2) not underestimating the complexity of dealing with ethnomusicological data, 3) took the effort to open its data already in the anonymous regime, and 4) reviews related work well, and justifies its choices on top of that well. It is not really a breakthrough in and of itself, but I think it clearly deserves a place at ISMIR.

Some aspects of the method, however, could still be improved.

(Most significant) Fig. 3 — There is some analysis of significance missing. Bootstrap sampling is generally an accepted method for this in the life sciences. Actually, perhaps better than bootstrapping and confidence intervals would be establishing a random baseline: what are the GSR values for a tree built randomly, rather than by neighbor joining? (Better, what is the distribution of GSR values across 10,000 or so random trees?) What are GSR values for a perfect genre tree („cheat“ and use genre directly to compute NJ distance, basically binary), and how much does it change when the „wrong“ tune is chosen and the perfect clustering is disturbed with some measure of probability? Then, the significance of having a higher vs. lower GSR value in the relatively small observed ranges could be made clear.

L.240: Why Euclidean distance over counts? Wouldn’t a metric based on some distribution over pattern counts be more appropriate? (Binomial? Poisson?)

Figs. 4 and 6 visually imply a root and passage of time, but the tree inference method does not really output any of that. The unrooted should have been presented differently. (See e.g. https://open.lib.umn.edu/humanbiology2e/chapter/1-5-introduction-to-phylogenies/ Fig. 1b for a more typical, better visualisation that I would recommend.)

It may also have been worth it, perhaps instead of the large Figure 3, to provide a figure for all melodies, or at least a substantial sub-sample across all genres, which was in the supplementary materials repository.

Fig. 5 — It is somewhat suspicious that the greatest distance aggregated over the first-stage melody tree is between waltzes and waltzes. It would be worth a comment, perhaps in the conclusions. Also, color-coding the genre names into Irish vs. Galician might help readability.

It is not clear what the Pearson correlation between the two genre trees actually measures.

Note also that given the presence of horizontal transmission in the 20th century, I’m not sure this is actually the best dataset to work with. Building a tree and evaluating it is a good first step before getting into network models with horizontal transmission, but if this is a methods development paper it is possibly more suitable to apply it on less complex material.

Finally, I want to point to what I think is an unused opportunity: discussing the kinds of relationships between genres more. One idea is building a genre network instead of a tree (or at least did a NeighborNet visualisation, though these are also problematic because of distances scaling issues). Fig. 3 also could have been discussed more, beyond just the „requires more musicology“ statement on L.374. The low numbers in the table are actually just as interesting as the high numbers, because they show which genres are not necessarily distinguished from others via their rhythmic and melodic patterns at all, pointing to extra-musical source of their identity. This goes back to the missed opportunity to use this study for insight into what makes genres coherent categories, and to what extent is it the music. It may turn out from such an analysis that certain genres for instance should not be included in analyses based on musical content, for example, because these categories are not salient to it.

Anyway, I will look forward to the presentation of this paper.

Review 3:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 (The title and abstract reflect the content of the paper.)

Agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Strongly agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Disagree

Q10 (Please justify the previous choice (Required if "Strongly Disagree" or "Disagree" is chosen, otherwise write "n/a"))

While the paper introduces an interesting methodology for comparing symbolic folk music scores across cultures, there are several methodological concerns. The extremely high correlation (0.99) between chromatic-rhythmic and purely rhythmic analysis suggests that the analysis pipeline may have induced biases that prevent meaningful differentiation between feature types. The narrow range of the averaged across genres separation ratio values (1.04-1.21; last row in Fig. 3) also raises questions about the discriminative power of the proposed methodology. Additionally, the paper lacks statistical significance analysis to validate whether the observed differences between genres are meaningful or merely artifacts of the processing pipeline.

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Disagree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Agree (Novel topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

The paper provides a valuable framework for hierarchical analysis of folk music traditions using symbolic scores. The approach of analyzing musical similarity at multiple levels (melodic content, phrase structure, and musical form) offers a useful paradigm for computational ethnomusicology. The introduction of Genre Separation Ratio as an evaluation metric could be applied to other datasets and similarity methods. Additionally, the adaptation of bioinformatics techniques to musical analysis demonstrates how methods from other fields can be productively applied to MIR problems. The new dataset of 600 annotated Galician and Irish folk music scores with expert phrase annotations is a significant contribution to the field that enables further research on cross-cultural musical analysis.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

Hierarchical analysis of folk music using multiple representations (chromatic, diatonic, rhythmic) and structural levels (notes, phrases, form) can reveal relationships between musical traditions and genres.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Agree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Weak accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

This paper introduces a methodology for hierarchical musical similarity analysis of folk traditions, specifically applied to a new dataset of 600 Galician and Irish folk music scores. The approach extends standard score alignment techniques to consider structural elements (phrases and form) and multiple representations (chromatic intervals, diatonic intervals, and rhythmic ratios).

Strengths: 1. The paper addresses an important topic in computational ethnomusicology, offering a systematic approach to comparing folk music traditions at scale.

The dataset is substantial (600 scores) and constructed with expert annotations for phrases and genre classification, providing a valuable resource for the MIR community.
The hierarchical approach considering multiple levels of musical structure (notes, phrases, form) is innovative and well-motivated by the nature of oral music traditions.
The adaptation of bioinformatics methods (sequence alignment, phylogenetic trees) to musical analysis is creative and potentially powerful.
The visualizations effectively illustrate relationships between genres within and across traditions.

Weaknesses: 1. Limited scope of representation: The paper acknowledges but does not address the limitation of working with symbolic scores, which omit crucial aspects of folk music traditions such as timbre, instrumentation, ornamentation and performance practice, elements that often carry significant cultural information.

Methodological concerns: The extremely high correlation (0.99) between chromatic-rhythmic and rhythmic trees suggests that the analysis pipeline may be introducing biases that override the actual discriminative power of different feature types.
Statistical significance: The paper lacks statistical analysis to determine whether the observed differences between genres (GSR values) are statistically significant or merely artifacts of the processing pipeline.
Narrow range of results: The averaged across genres separation ratio values (1.04-1.21; last row in Fig. 3) fall within a tight range (1.04-1.21), raising questions about the discriminative power of the proposed methodology.
Unclear details: Several technical aspects are insufficiently explained, such as the exact implementation of the Quality Threshold Clustering algorithm and the specifics of the Neighbor Joining method used for phylogenetic tree construction.
Expert dependency: The methodology relies heavily on expert annotations for phrases, which limits its scalability and application to other datasets without significant manual effort.

Recommendations: 1. Provide a deeper analysis of the correlation between different feature representations (chromatic, diatonic, rhythmic). The near-perfect correlation between some trees suggests potential issues with the methodology.

Include statistical significance testing to validate that the observed genre separations are meaningful.
For each distance matrix between genres (e.g., Figure 5), compare the diagonal elements (within-genre similarity) to the average of off-diagonal elements (which represent across-genre similarity), to directly test the assumption made in Section 3.2, that "pieces within each genre tend to share patterns at the pitch and rhythmic levels".
Provide more details on the QT Clustering algorithm parameters and the Neighbor Joining method implementation.
Investigate the potential for automating phrase detection to reduce dependency on expert annotations.

Minor suggestions: 1. Figure 1: A description of 'D', 'R', 'CR' etc. would help the reader. 2. line 113: "**kern": Undefined term. 3. lines 267-268: "Neighbour Joining": missing reference 4. lines 323,350 etc: ensure the usage of latex \ref command for the figures.

This paper presents a valuable contribution to computational ethnomusicology with its hierarchical approach to musical similarity analysis. The methodology shows promise for understanding relationships between folk traditions, though there are concerns about the robustness of the results given the high correlations between different feature representations. With refinements to address the methodological issues and more comprehensive statistical analysis, this work could provide significant insights into cross-cultural musical influences and similarities.

P2-9: Phylo-Analysis of Folk Traditions: A Methodology for the Hierarchical Musical Similarity Analysis

Hilda Romero-Velo, Gilberto Bernardes, Susana Ladra, José R. Paramá, Fernando Silva

Presented In-person

4-minute short-format presentation