Abstract:

Raga classification in Indian Art Music is an open set problem where unseen classes may appear during testing. However, traditional approaches often treat it as a closed set problem, rejecting the possibility of encountering unseen classes. In this work, we first employ an Uncertainty-based Out-Of-Distribution (OOD) detection, given a set containing known and unknown classes. Next, for the audio samples identified as OOD, we employ Novel Class Discovery (NCD) approach to cluster them into distinct unseen Raga classes. We achieve this by harnessing information from labelled data and further applying contrastive learning on unlabelled data.
With thorough analysis, we demonstrate how different components of the loss function influence clustering performance and how varying the openness affects the NCD problem in hand.

Meta Review:

Q2 ( I am an expert on the topic of the paper.)

Strongly agree

Q3 ( The title and abstract reflect the content of the paper.)

Strongly agree

Q4 (The paper discusses, cites and compares with all relevant related work.)

Disagree

Q5 ( Please justify the previous choice (Required if “Strongly Disagree” or “Disagree” is chosen, otherwise write "n/a"))

I feel it misses the papers on Carnatic music: Sankalp Gulati et al, Shrey Dutta et al. on "Raga ID" for Carnatic Music. Hindustani music is not very different from Carnatic music.

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Strongly agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Strongly agree

Q10 (Please justify the previous choice (Required if “Strongly Disagree” or “Disagree” is chose, otherwise write "n/a"))

Dividing HM excerpts into 30 s chunks is totally incorrect. One should segment at the level of gat. Phrases purvanga and uttaranga can be distinctly different for the same raga -- by forcing a match is terribly wrong. In Misra ragas there could just a phrase that comes from a different raga. You need to segment this at the phrase level not arbitrarily at 30s intervals.

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated “Strongly Agree” and “Agree” can be highlighted, but please do not penalize papers rated “Disagree” or “Strongly Disagree”. Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Agree (Novel topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

I think the insights are pretty wrong. Let me give examples: Ragas in CM Sriranjani- Abhogi, kirvani-simhendramadhyamam. Similarly bimpalasi-bageshri from Kafi thaat for HM. What about the thaats, and the putra ragas of the corresponding thaats? Some details MUST be given -- while they have stated that it does not work well for ragas belonging to the same thaat -- some explanation of why -- is required.

Q16 ( Write ONE line (in your own words) with the main take-home message from the paper.)

Identification of unseen ragas

Q17 (This paper is of award-winning quality.)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Agree

Q20 (Overall evaluation (to be completed before the discussion phase): Please first evaluate before the discussion phase. Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines.)

Weak accept

Q21 (Main review and comments for the authors (to be completed before the discussion phase). Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

I feel the authors do not have an idea of IAM at all. IAM like Indian languages are phrase structured. You can not pick up 30s sections and believe that they represent the raga. Also purvanga and uttaranga -- even if they belong to the same raga can have completely different movements.

Q23 (Meta-review and final comments for authors (to be completed after the discussion phase))

While the idea is new, the work does not use any culture specific methodology. My recommendation: Reject. Such papers should not be given importance even if there is novelty in terms of technology. It has to relate the music and the science of the art form. This is completely missing.

Review 1:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 (The title and abstract reflect the content of the paper.)

Agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Disagree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Disagree (Standard topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

The training recipe for improved NCD detection, the ablation studies and metrics.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

Generic music embedding models are not good enough for indian music raga identification, custom models on trained on smaller datasets can outperform them.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Agree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Strong accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

  • This paper discusses OOD and NCD methods for Indian Raga classification in the wild. The experiments are scientific, and the results are promising on the public saraga and PIM datasets.
  • The paper is a little hard to follow considering multiple experiments and results. Consider restructuring and proof-reading for better readability.
  • Please consider making the models, training and inference code opensource. This will help the community as you have leveraged opensource datasets.
  • Typo in the first line of sub-subsection'4.3.2 Open-ness'
Review 2:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 (The title and abstract reflect the content of the paper.)

Agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Strongly agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Strongly disagree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Agree (Novel topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

A methodology for handling out of distribution samples in hindustani raga recognition has the potential to be very useful for future raga analysis work since OOD samples are a common and expected issues. Whilst the method is novel, it contains many parts that are insufficiently justified in the paper. The code to reproduce the analysis is not provided, neither is any comparison to an external baseline, which there are many of (ref [4,5,6] for example).

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

The recurrent problem of how to handle out of distribution samples in the case of hindustani raga recognition is addressed using a combination of labelled pretraining and self-supervised deep clustering. The authors utilize two large datasets of hindustani music and demonstrate respectable results on those same datasets.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Agree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Weak accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

Thanks for the great work. I agree that the type of methods presented in this paper are necessary for those wishing to study hindustani raga computatonally at scale - a task for which OOD samples are common and expected. I have selected a weak accept because I think the methodology is novel in this context, necessary, and yields decent results. However I would request the authors to make some quite significant additions/changes to help with the reproducibility and interpretability of the study...

  • Most importantly, you present a very multi-facceted approach that requires many design decisions. However you provide little explanation of how you made those. A non-exhaustive list would be...
  • selecting K for k means and for UMAP
  • the cosine similarity clustering threshold
  • model hyperparameters
  • why chromogram features?
  • etc...

  • You include no comparison to existing raga classification studies, even though you use a dataset from a raga classification paper. This combined with a poor description of design decisions makes it hard to evaluate the value of the trained model.

  • looking at the results quickly in [6] I dont think you achieve a higher f1. You need to be explicit as to why even with that being the case, this paper constributes a valuable contribution (namely the ability to handle out of distribution samples).

  • The structure of the paper could be improved.

  • Figure 1 could be bigger and more explicative. You present a process with many parts, think about making this diagram a lot clearer, bigger, and use labelling that corresponds with the text in the methodology section
  • Algorithm 1 is included but not introduced or referenced in text
  • does algorithm 1 cluster? it seems like it is the training process not the prediction process?
  • some terms in algorithm 1 are not defined, e.g. x_i_u
  • if you are to include it make it clear and a valuable addition
  • be clearer about exactly what data is being used for training and testing and how big it is, how many ragas it includes etc..
  • consider separating experiments out into named experiments: introduce them briefly in some summary introduction to section, explain them in detail in their respeective subsections, reference them by name in results section

  • Please consider including the code to reproduce this analysis or at the very least allow others to use your model. You may also want to consider including some more detailed results in that repository.

Some smaller comments....

  • Line 209: finally
  • line 384: openness
  • line 380-383: sentence is hard to parse. perhaps an incorrect use of wherever.
  • line 385: colon at start of sentence
  • The hindustani dataset in saraga is quite noisy, how did you clean the data or exclude non melodic instruments?
  • presumably the entire work is based only on hindustani ragas. You could be explicit about this since Saraga contains many carnatic performances.
  • What about audio style. Would this model work on reordings with just a vocalist? What about with/without a harmonium etc... Maybe you can comment on the nature of the two datasets and its implication for using this model in the real world
  • You make no comment on the scalability using pairwise distance metrics - how does this influence training time/prediction time. It doesn't take many samples for relying on pairwise distancing to become infeasible.
  • If you have expertise on hindustani music perhaps you could use the extra space you have to talk a bit more about some of the results from a musicological perspective. Why does the model get confused in the cases where it predicts incorrectly? You mention thaats very briefly but at this point have given no introduction to what a thaat is (nor pancham, or aaroh). If you have a bit of space left think about refining this section, it may be useful for other researchers/musicologists.
  • btw, you could probably free up some space by reducing the amount of subsections you have (see: sec. clustering and sec. evaluation)
Review 3:

Q2 ( I am an expert on the topic of the paper.)

Strongly disagree

Q3 (The title and abstract reflect the content of the paper.)

Agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Disagree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Disagree (Standard topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

Reusable insights stem from the setup of the cluster algorithms, loss functions and their individual performance.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

For the classification of unseen Ragas, the paper uses an “ensemble” of Out-of-Distribution detection and Novel Class Discovery in combination with a CNN-LSTM to identify and place Ragas into their respective clusters.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Agree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Weak accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

The paper presents a thorough and interesting method to classify/cluster future Ragas. The experiments appear thorough, and a lot of important information is scattered throughout the manuscript. It would be tremendously beneficial to have a flowchart or table with all experiments, listing which parts of f, g and g’ are used where, and which datasets were used for (pre-)training, testing and obtaining embeddings. It would also be nice if all symbols (i.e. x, y, z) appeared in the listing for Algorithm 1. For someone less versed with Raga, it would be good to know what kind of Raga classes a classifier would classify them into. Is there a hierarchy (i.e., general Ragas and subclasses of them)? The introduction states that a Raga is a distinct set of notes, so is the idea to classify unseen set of notes simply as Ragas or certain sub-types of Ragas? This becomes clearer much later around line 273, but could be mentioned earlier. There are quite a lot of errors in sentences and inconsistencies in labeling (e.g. f(.) vs f(·)), please do another round of proof-reading.

48: Even though “target classes <= training classes” is a common assumption in NCD, this directly contradicts “the number of Ragas is not fixed” (33). But I am assuming that the model will have a certain “target space” for unseen targets that theoretically enables it to recognize any unseen Raga class when it is run individually? 143: feature extractor has an error in the “denotion” 152: unclear; in 142 it was said that the softmax layer is removed 185: what is an audio chunk? 214: The criticality of the scaling hyperparameteres demands an explanation of how it is done. Section 3.4: This can all be written in a single paragraph without the need for sub-sections 3.4.x 256: Write out ACC once. 269: This could mention once more that S^l is sourced from the PIM dataset. An overview of all used datasets would be useful (number of files, length, train/val/test size, number of classes...) 381-383: This is confusing. Did the dataset description get mixed up? 419-421: It would be nice to see a confusion matrix for this statement?

Table 2: According to 254-255, higher values indicate greater similarity between two clusterings. Is it then not a goal to reduce similarity between clusterings?

Figure 1 could be more self-explanatory, particularly the two OOD blocks. The caption should explain crucial parts.

Figure 2: Could be a normalized confusion matrix.