P5-5: LoopGen: Training-Free Loopable Music Generation

Davide Marincione, Giorgio Strano, Donato Crisostomi, Roberto Ribuoli, Emanuele Rodolà

Subjects: Evaluation metrics ; Open Review ; Music and audio synthesis ; Generative Tasks

Presented In-person

4-minute short-format presentation

Abstract:

Loops--short audio segments designed for seamless repetition--are central to many music genres, particularly those rooted in dance and electronic styles. However, current generative music models struggle to produce truly loopable audio, as generating a short waveform alone does not guarantee a smooth transition from its endpoint back to its start, often resulting in audible discontinuities. We address this gap by modifying a non-autoregressive model (MAGNeT) to generate tokens in a circular pattern, letting the model attend to the beginning of the audio when creating its ending. This inference-only approach results in generations that are aware of future context and loop naturally, without the need for any additional training or data. We evaluate the consistency of loop transitions by computing token perplexity around the seam of the loop, observing a 55% improvement. Blind listening tests further confirm significant perceptual gains over baseline methods, improving mean ratings by 70%. Taken together, these results highlight the effectiveness of inference-only approaches in improving generative models and underscore the advantages of non-autoregressive methods for context-aware music generation.

Meta Review:

Q2 ( I am an expert on the topic of the paper.)

Strongly agree

Q3 ( The title and abstract reflect the content of the paper.)

Strongly agree

Q4 (The paper discusses, cites and compares with all relevant related work.)

Disagree

Q5 ( Please justify the previous choice (Required if “Strongly Disagree” or “Disagree” is chosen, otherwise write "n/a"))

While a minor issue, there is prior work that does non-autoregressive music loop generation. See “DITTO: Diffusion Inference-Time T-Optimization”. https://arxiv.org/abs/2401.12179. The overall idea is generally similar, but this work uses inference-time techniques for standard diffusion models (not on MAGNET) which would also be considered a non-autoregressive (NAR) model.

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Strongly agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated “Strongly Agree” and “Agree” can be highlighted, but please do not penalize papers rated “Disagree” or “Strongly Disagree”. Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Disagree (Standard topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

The reusable insight is 1) focused view on how to modify MAGNet style non-AR generation methods at inference-time to better create musical loops 2) Proposal to use time-signature-aware length control 3) a proposed evaluation metrics for measuring loop seems.

Q16 ( Write ONE line (in your own words) with the main take-home message from the paper.)

Modify MAGNet-style non-autoregressive music generation at inference-time to generate loops with more seamless loop boundaries together with a beat-aware algorithm improvement and new eval metric.

Q17 (This paper is of award-winning quality.)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Disagree

Q20 (Overall evaluation (to be completed before the discussion phase): Please first evaluate before the discussion phase. Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines.)

Weak accept

Q21 (Main review and comments for the authors (to be completed before the discussion phase). Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

Summary: In this work, the authors present a new algorithm to extend the music generation method of MAGNet to better generate seamless musical loops. MAGNet is a non-autogressive generation technique distinct from LLM-based generation and diffusion models, but similar in spirit to the latter. In addition to the main modification to the MAGNet inference-time generation algorithm the authors propose time-signature-aware length control modification to better focus on loops with proper meter/tempo and a new evaluation metric for measuring seamlessness of musical loops. Major/minor comments: Overall, very nice work! The focus and application of music generation for real musician workflows of using and creating loops is very nice. The work is well organized at a high-level and has well written sentence structure at a low level. In terms of the propose work, 1) the proposed algorithm is relatively clear at a high-level 2) the beat or signature-aware length control is very nice and 3) the motivation of the proposed evaluation metric is very nice as well. In terms of areas for improvement, 1) Although the high-level details of the proposed algorithm are explained well in prose and the accompanied supplementary pdf with code, the low-level details of the proposed algorithm are a little vague. Much of the low-level details are simply deferred to the MAGNet paper/code. Ideally there is an official “algorithm”, more details on the model, inference speed, etc. 2) The evaluation metric of “seam perplexity” is very nice, but ideally is a little more clear to improve the reusable insight. In section 5.1.1, the phase “a well-trained model M assigns a probability M(x_i) to each token x – this phrase implies that the metric only works for generation techniques with discrete token generation and not diffusion-based models that use continuous latents. Please confirm. Furthermore, it is unclear why the time-window around the seam is only forward-looking and not zero-centered around the seam. Please explain. 3) Please see the past work of “DITTO: Diffusion Inference-Time T-Optimization”. https://arxiv.org/abs/2401.12179. The work proposed an inference-time algorithm for loop generation using pre-trained non-AR models (diffusion models). This is a minor issue to just the related work and how the method is related to other generation techniques including LLM-based loop generation, diffusion-based loop generation, and MAGNet-style generation. 4) The evaluation is very small scale and only done with 100 text prompts. This is so small that most evaluation metrics, particularly FAD_vggish have very high variance and likely wiggle around a lot solely based on the small evaluation sample size.

Q22 (Final recommendation (to be completed after the discussion phase) Please give a final recommendation after the discussion phase. In the final recommendation, please do not simply average the scores of the reviewers. Note that the number of recommendation options for reviewers is different from the number of options here. We encourage you to take a stand, and preferably avoid “weak accepts” or “weak rejects” if possible.)

Accept

Q23 (Meta-review and final comments for authors (to be completed after the discussion phase))

Summary: In this work, the authors present a new algorithm to extend the music generation method of MAGNet to better generate seamless musical loops. MAGNet is a non-autogressive generation technique distinct from LLM-based generation and diffusion models, but similar in spirit to the latter. In addition to the main modification to the MAGNet inference-time generation algorithm the authors propose time-signature-aware length control modification to better focus on loops with proper meter/tempo and a new evaluation metric for measuring seamlessness of musical loops.

Initial Scores: 1 strong accept, 2 weak accepts, 1 strong reject

Metareview: Overall, the reviews are generally positive. In summary, there are three main topic areas for improvement Issue #1 - R2 Intuition of "Section aware length control (section 4.4)" is unclear Issue #2 - R1, R2 Validity of proposed eval metrics Issue #3 - R1, R3, and MR - Commentary on related work could be improved.

Discussion: The discussion was brief and focusing on clarification of #1 and consensus that #2 would likely be the best area for improvement. From the perspective of the reviews and overall contribution, however, I recommend these issues don’t hold back the paper as they can be mostly all addressed via light editing.

Recommendation: Accept

Review 1:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 (The title and abstract reflect the content of the paper.)

Agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Strongly agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Strongly agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Agree (Novel topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

There is a trend of performing modification to a neural synthesizer without retraining it and the application scenario is interesting both methodologically and practically speaking.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

Controling tokens of a transformer-based generation system allows to enforce pleasantness the looping of the synthesized sound at inference.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Agree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Strong accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

The paper describes an inference based technique for generating loops by controlling the generation of tokens of a non autoregressive music generator. The approach is sound, the different design options evaluated make sense and are evaluated with FAD and the authors performed a subjective evaluation to demonstrate that their deisgn of choice is better than a strong baseline.

My main concern is the proposal of the perplexity metric. It seems to be overcomplicated, plus the authors do not evaluate its effectiveness on a controlled dataset, plus they do not give it any credit to decide which algorithm design is the best.

The authors argue that FAD is not suitable, but I fail tu understand why a FAD computed on 1) circular shifted by half the length or 2) on x times repeated loops would not be effective. If the looping lead to artefacts that are or not seen in the reference dataset, this will induce a systematic distribution shift that will be captured by the Fréchet metric.

Minor comments:

"A low seam perplexity indicates that the seam is “easy” for a strong reference model to predict, suggesting a smooth transition. Conversely, a high value suggests abrupt discontinuities or other artifacts at the loop boundary": demonstrate validity on two examples with spectrograms and a small two classes dataset

"via a LLM": which one ?

"and what, going forward, we call LoopGen." Consistency in naming is mandatory in a technical paper.

"However, this adjustment has minimal impact on the overall mean ratings for both models: indicating that ratings are stable between users." If it is not needed, processing of ratings shall be avoided.

"perceptibility of the seam, as can also be seen in Figure 5." From reading it is unclear if the authors are refering to a spectrogram of the seam or the rating's analysis.

Review 2:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 (The title and abstract reflect the content of the paper.)

Agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Disagree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Disagree

Q10 (Please justify the previous choice (Required if "Strongly Disagree" or "Disagree" is chosen, otherwise write "n/a"))

I believe the main loss metric proposed in the paper, i.e. seam perplexity is incorrectly defined. Additionally I think some of the baselines (both the negative and positive anchor) could be stronger to establish their method more convincingly.

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

disagree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Agree (Novel topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Disagree

Q15 (Please explain your assessment of reusable insights in the paper.)

I think the metrics and baselines have to be more clearly and carefully developed to provide reusable insights to the community (see review).

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

This work presents an inference-only method to generate loops from MAGNeT, a pre-trained text-to-music masked non-autoregressive transformer. This is achieved by creating copies of the end and beginning of the loop and prepending and appending them respectively to the sequence allowing the model to attend to them.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Disagree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Strong reject

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

This paper identifies a gap in the ability of generative models to create loops that can repeat seamlessly and also proposes a strategy to make the duration of these loops amenable to production settings via a ‘beat-aware technique’. This method uses a circular padding technique that allows the end of the loop to attend to the beginning and the beginning to attend to the end during inference. The motivation is compelling and the solution being inference-only is also powerful as it becomes scalable to multiple pre-trained large models. The code for this project is also provided which is much appreciated.

I think the paper presents an interesting idea, however I believe some important details need to be sorted out to ensure scientific and technical soundness. I have discussed these aspects below.

Section aware length control (section 4.4): My understanding of the motivation for this method is that the generative model, MAGNeT, generates tokens (which can be thought to operate in the time domain) whereas musicians or humans think of loops in terms of bars. In order to match these two units, the authors propose a method to calculate the number of seconds of audio to generate given an initial audio prompt. They do this by calculating the BPM in the initial audio prompt and making the assumption that the generated sample will follow the same tempo and calculate the duration of generation given a user-defined ‘preferred number of bars’, n. I don’t understand the intuition behind doubling or halving the duration in Algorithm 1. I understand that there are constraints on the duration of audio that can be generated, perhaps based on external factors such as compute. However from a user point of view, I expect that halving or doubling the number of bars that is recommended would be quite confusing. Perhaps there is some intuition that I am missing, which I recommend is more explicitly stated in the paper.

Another minor point: Is this technique addressing ‘coherency’ of the repetition (line 62)? To my understanding, based on section 4.4, this method simply allows for the generated loop to fit in an integer number of bars. This feels more like a method to make generated samples more amenable to real-world production use cases. The coherency of the loop generated is addressed by the tiling method proposed.

Related Work: There is another line of work that tries to make existing loops vary or react to a musician as seen with Vampnet [1] (as you have already mentioned), and Reflexive Looper [2]. Drawing the distinction that you are trying to in fact generate these loops as opposed to varying an existing loop would be nice.

Seam perplexity: Cross-entropy and perplexity are both measures used to evaluate how similar two distributions are. The definition of cross entropy in equation 1 is incorrect as it isn’t taking into account the true distribution p. The equation holds good if one has ground truth tokens and x_i is from the ground truth sequence. However in your case, there is no ground truth and thus this equation doesn’t refer to cross entropy and by extension perplexity. An alternative measure that could be used with only one distribution is perhaps the entropy. I also wonder if one could define some hand-crafted metrics such as beat continuity, harmonic consistency based on chroma at the seam to check the continuity of the loop.

Additionally, the duration of the window used to calculate ‘seam perplexity’ and how that value was decided upon is not discussed anywhere. Since this is a crucial hyperparameter, it should be discussed in the text.

Experiments:

4 a. Text prompts: How are the textual prompts generated? What was the goal of creating the prompts? Were there considerations to ensure diversity among prompts, are there certain use cases (live performance, studio settings), particular instrumentation, tempos that you considered while making these prompts. I think this information should be discussed in the text since all the generations are conditioned on these prompts.

4 b. FAD: What was the duration of generated samples considered when computing FAD. Were the samples looped, if so how many times (what was the duration of the samples)? And how did that duration compare to the ground truth data from FMA-Pop? Additionally, I wonder if there may be a better suited loop-based dataset to better highlight the ground truth distribution in the FAD comparison. Maybe the datasets in LoopNet [3], that you have cited in your related work or Freesound [4] might be helpful.

4 c. Seam Perplexity: Seam Perplexity is used as a measure to compare between tiled and hybrid tiled models even thought lines 385-389 suggest that the hybrid tiled variant has a higher value because tokens are drawn from two different distributions. In this case, this feels like an unfair measure to use and I wonder if alternatives can be used such as beat-continuity or chroma-based measures to check the continuity of the generation at the seam (as suggested above).

4 d. Table 1: I think the table would be much easier to digest as a line plot with w on the x axis and FAD / Seam Perplexity on the y axis. Additionally the caption can be more informative.

4 e. Table 2: What is the difference between Vanilla and Naive versions of the model? If Naive is just a looped version of vanilla, why are the FAD CLAP scores slightly different? This should be made clear.

I also wonder if there can be a stronger baselines established in the comparisons stated in Table 2 to make a more compelling argument. Perhaps a generative model fine tuned on a loop dataset could replace the naive MAGNeT and MusicGen models. Additionally a ground truth dataset of loops could be the “gold standard” or a possible ceiling on metric values that we could look for.

4 f. Fig 4: It is quite difficult to read figure 4 especially without a y-axis and overlapping charts. I also believe that the content of this figure is already conveyed in Table 2 with the Seam perplexity column.

4 g. Fig 5: Figure 5 is quite confusing as the bar considers the x-axis to be discrete values (understood by the different coloured bars corresponding to the same number being present on either side of that value) but the lines are drawn considering the axis to be continuous. Additionally I think error bars on the bars should be present to convey a clearer picture.

4 h. Samples: In the samples provided, it would be nice to be able to predict baseline and loopgen generations with the same ‘conditional audio prompt, c’. I believe only ‘rock’ was common in the examples provided.

[1] Garcia, Hugo Flores, et al. "Vampnet: Music generation via masked acoustic token modeling." ISMIR (2023). [2] Pachet, François, et al. "Reflexive loopers for solo musical improvisation." Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2013. [3] Chandna, Pritish, et al. "Loopnet: Musical loop synthesis conditioned on intuitive musical parameters." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. [4] freesound.org

Review 3:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 (The title and abstract reflect the content of the paper.)

Agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Strongly agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Strongly agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Strongly agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Agree (Novel topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

This paper proposes a novel method for generating seamlessly loopable music snippets using non-autoregressive generative models. The key idea is to simulate a looping structure during inference by framing the generation as an inpainting task, where the beginning and end of the target segment are duplicated as context before and after the main generation region. This design insight—implementing circular context at inference time without retraining—is a reusable strategy that could generalize to other generative tasks requiring boundary consistency.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

Implementing circular context at inference time, combined with beat alignment, enables effective loopable music generation without retraining the model.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Disagree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Weak accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

Strengths: 1. The paper is well written and easy to follow, with clear and informative figures that effectively illustrate the proposed method. 2. The experimental setup is sound, and the proposed evaluation metric (seam perplexity) is well motivated and aligns with the goal of loopable audio generation. 3. The paper introduces a new task—generating loopable music content from scratch using a pre-trained model—with an elegant and reusable inference-time solution.

Weaknesses and Comments: 1. While the specific formulation of "generating loopable content from scratch, leveraging pre-trained text-to-music model" is a valuable refinement, the task itself is only moderately novel. Prior works like VampNet have already explored loopable audio generation, albeit from audio prompts rather than text. 2. The notation λ is reused in different contexts (classifier-free guidance and beat duration), which may cause confusion. It would help to disambiguate or re-label one of the usages. 3. In the demo samples, different models are conditioned on different prompts, making it difficult to compare their outputs fairly. Using the same prompts across models would provide more direct and interpretable comparisons.