P4-1: Audio synthesizer inversion in symmetric parameter spaces with approximately equivariant flow matching
Ben Hayes, Charalampos Saitis, György Fazekas
Subjects: MIR fundamentals and methodology ; Machine learning/artificial intelligence for music ; Awards Nominee ; Knowledge-driven approaches to MIR ; Creativity ; Generative Tasks ; Tools for artists ; Music composition, performance, and production ; Music signal processing ; MIR tasks ; Applications ; Open Review ; Music synthesis and transformation ; Music and audio synthesis
Presented In-person
10-minute long-format presentation
Many audio synthesizers can produce the same signal given different parameter configurations, meaning the inversion from sound to parameters is an inherently ill-posed problem. We show that this is largely due to intrinsic symmetries of the synthesizer, and focus in particular on permutation invariance. First, we demonstrate on a synthetic task that regressing point estimates under permutation symmetry degrades performance, even when using a permutation-invariant loss function or symmetry-breaking heuristics. Then, viewing equivalent solutions as modes of a probability distribution, we show that a conditional generative model substantially improves performance. Further, acknowledging the invariance of the implicit parameter distribution, we find that performance is further improved by using a permutation equivariant continuous normalizing flow. To accommodate intriciate symmetries in real synthesizers, we also propose a relaxed equivariance strategy that adaptively discovers relevant symmetries from data. Applying our method to Surge XT, a full-featured open source synthesizer used in real world audio production, we find our method outperforms regression and generative baselines across audio reconstruction metrics.
Q2 ( I am an expert on the topic of the paper.)
Agree
Q3 ( The title and abstract reflect the content of the paper.)
Strongly agree
Q4 (The paper discusses, cites and compares with all relevant related work.)
Strongly agree
Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)
Agree
Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)
Yes
Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)
Strongly agree
Q9 (Scholarly/scientific quality: The content is scientifically correct.)
Strongly agree
Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)
Strongly agree
Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)
Agree
Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated “Strongly Agree” and “Agree” can be highlighted, but please do not penalize papers rated “Disagree” or “Strongly Disagree”. Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)
Agree (Novel topic, task, or application)
Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)
Strongly agree
Q15 (Please explain your assessment of reusable insights in the paper.)
Estimating the parameters of a synthesizer from sound is challenging due to the symmetries of parameter space which leads to a similar sound output.
Q16 ( Write ONE line (in your own words) with the main take-home message from the paper.)
The equivariant generative approach to parameter estimation is effective in solving such problems.
Q17 (This paper is of award-winning quality.)
Yes
Q18 ( If yes, please explain why it should be awarded.)
Indicating an issue of symmetries in synthesizer parameter space and providing alternative solutions for the problem could widely contribute to other MIR tasks, such as instrument recognition and genre recognition. Considering the potential influence of the discussion in this paper, I believe the paper has an award-winning quality.
Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)
Agree
Q20 (Overall evaluation (to be completed before the discussion phase): Please first evaluate before the discussion phase. Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines.)
Strong accept
Q21 (Main review and comments for the authors (to be completed before the discussion phase). Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)
The highlight of the paper is the logical explanation of why the proposed method works. Unlike other papers that present a method only by comparing it to other previous methods, this paper explains the core problem of estimating synthesizer parameters from acoustic signals and presents a solution for it.
The weakness could be the lack of a rough abstract at the beginning of the method section, which would enable readers to prepare for the mathematical formulation.
Q22 (Final recommendation (to be completed after the discussion phase) Please give a final recommendation after the discussion phase. In the final recommendation, please do not simply average the scores of the reviewers. Note that the number of recommendation options for reviewers is different from the number of options here. We encourage you to take a stand, and preferably avoid “weak accepts” or “weak rejects” if possible.)
Strong accept
Q23 (Meta-review and final comments for authors (to be completed after the discussion phase))
This paper Indicates an issue of symmetries in synthesizer parameter space and providing alternative solutions for the problem could widely contribute to other MIR tasks. Unlike other papers that present a method only by comparing it to other previous methods, this paper explains the core problem of estimating synthesizer parameters from acoustic signals and presents a solution for it. We reviewers would like to suggest strong accept for this paper.
As reviewers indicate weaknesses in readability (reviewer 1), typos and notation problems (reviewer 3), and the unclearness of the initialization process and Figure 2 (reviewer 4), I strongly suggest checking through the draft and modifying the points in the camera-ready version.
Q2 ( I am an expert on the topic of the paper.)
Agree
Q3 (The title and abstract reflect the content of the paper.)
Agree
Q4 (The paper discusses, cites and compares with all relevant related work)
Agree
Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)
Disagree
Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)
Yes
Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)
Agree
Q9 (Scholarly/scientific quality: The content is scientifically correct.)
Strongly agree
Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)
Agree
Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)
Agree
Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)
Disagree (Standard topic, task, or application)
Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)
Agree
Q15 (Please explain your assessment of reusable insights in the paper.)
The experiment show potential improvements in sound matching when addressing the symmetries in synthesizer parameters, and generally better performance of generative models against regresive models
Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)
Equivariant continuous normalizing flows show promise for synthesizer sound matching
Q17 (Would you recommend this paper for an award?)
No
Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)
Disagree
Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)
Strong accept
Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)
This paper explores the problem of symmetries in synthesizer parameter sets within the task of in-domain sound matching.
The paper addresses a relevant problem and proposes several solutions. The results obtained with a real-world synthesizer show an improvement over other methods, although most of the improvement seems to come from the use of continuous normalizing flows.
The text is generally hard to read, and the authors should make an effort to make it more accessible. The problem of synthesizer inversion is quite niche, and it should be defined much earlier. The abstract should allow non-experts to understand what the paper is about more generally. The approach in 3.1 could be explained more clearly, and generally more diagrams (e.g. from the supplementary materials) would help, whereas even more of the mathematical detail could also be moved to the supplementary material.
The need of automatic handling of permutations could also be justified better. First, the problem assumes uniform sampling of parameters, whereas many works have been based on real-world parameter databases. It is unclear how the symmetries affect the models in this case, and what is the best solution for existing synthesizers. In this sense, the difference in results observed at the end of 4.2.3 should be further clarified or corrected. Second, the authors disregard manual approaches, and claim that automation would scale better, but this is not as obvious as it may seem. As an example, in the first experiment the asymmetric variants (which are a sort of manual mapping of the parameters) perform better, which is not discussed. It is also not clear enough in this experiment that the proposed methods are much better than existing approaches, so it seems at least some should also be used in the second experiment. Generally there seem to be many differences between the two experiments (e.g. missing CNF(Equivariant), AST conditioning), which should be justified. Some discussion of computational cost would help understanding the benefits/cost of this approach against a manual mapping or non-random training data. Finally, given the performance of CNF (MLP), the authors should discuss this model more in depth in relation to previous work using normalizing flows, and acknowledge the limitations of the proposed approach.
Q2 ( I am an expert on the topic of the paper.)
Disagree
Q3 (The title and abstract reflect the content of the paper.)
Strongly agree
Q4 (The paper discusses, cites and compares with all relevant related work)
Agree
Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)
Disagree
Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)
Yes
Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)
Strongly agree
Q9 (Scholarly/scientific quality: The content is scientifically correct.)
Agree
Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)
Agree
Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)
Strongly agree
Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)
Disagree (Standard topic, task, or application)
Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)
Agree
Q15 (Please explain your assessment of reusable insights in the paper.)
The idea of taking symmetries into account is very interesting and I believe can be re-used in similar audio-related problems presenting symmetries, e.g. source separation, localization,...
Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)
Taking into account symmetries makes easier to retrieve the parameters of a complicated synthesizer from an audio file generated with such instrument.
Q17 (Would you recommend this paper for an award?)
No
Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)
Agree
Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)
Weak accept
Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)
The paper presents a technique to estimate the parameters of a synthesizer from an audio generated with such instrument by taking into account the symmetries of the problem and equivariant flow matching. While I highly enjoyed the topic and the innovative solution, my main concerns are relative to the accessibility of the paper. The topic is not simple and a lot of concepts are taken into account. I wonder if the paper is not more well suited to a longer and more in-depth journal paper than to a conference. Nonetheless it is an interesting work.
3.1 what are S_k
Fig. 2 typo: Synthesiser —> Synthesizer
\matbhf{omega} , \matbhf{\alpha} an \matbhf{\gamma} to what domain do they belong to?
shouldn’t vector “x” be in bold? (col right row 250) also, are the elements concatenated all together in a single column vector? Do they all belong to the range -1,1
4.1 Results “CNF (PA R A M 2 T O K) performs on par with the best models across conditions,” —> this is not true, in some situations it actually performs worse e.g. k=32 for LAC Symmetric and MSE symmetric in all cases
4.2.2 For more detail on this point, see the supplementary material. —> should be a journal paper
Q2 ( I am an expert on the topic of the paper.)
Agree
Q3 (The title and abstract reflect the content of the paper.)
Strongly agree
Q4 (The paper discusses, cites and compares with all relevant related work)
Strongly agree
Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)
Strongly agree
Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)
Yes
Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)
Strongly agree
Q9 (Scholarly/scientific quality: The content is scientifically correct.)
Agree
Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)
Strongly agree
Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)
Strongly agree
Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)
Strongly Agree (Very novel topic, task, or application)
Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)
Strongly agree
Q15 (Please explain your assessment of reusable insights in the paper.)
Most of the previous work on sound matching tend to be framed as regression task and overlook symmetries in solution space. This work dealt with this largely neglected challenge, convincingly revealed the limitations of the regression task, and proposed alternatives that address the indeterminacy of solutions. These insights are valuable not only for sound matching problems but also for addressing ill-posed inverse problems at large.
Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)
Facing the ill-posed inverse problem of synthesizer matching with parameter symmetries, the authors highlighted the problematic, deterministic nature of framing it as a regression task and advocated for a probabilistic, generative approach. They proposed a param2tok framework to account for invariance in orbits containing parameter symmetries and demonstrated its robustness in parameter prediction compared to other approaches without token mapping.
Q17 (Would you recommend this paper for an award?)
Yes
Q18 ( If yes, please explain why it should be awarded.)
The existence of non-unique solution is common yet much overlooked in inverse problem. The authors have mathematically formulated the symmetries, which justified their architecture choices. The experiments are performed on both simplified and real-world synthesizer, the parameter spaces are curated such that they reflect symmetry and asymmetric conditions. The experimental results well reflected the mathematical intuition and have shown the robustness of the proposed approach.
Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)
Agree
Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)
Strong accept
Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)
strengths: the problem is relevant in many inverse problems and essential to consider if performing sound matching task on industry-scale synthesizers. the approach is novel. The insights are significant. the metrics adopted are state-of-the-art approaches refined upon previous metrics. weaknesses: in section 3.2, it is unclear how z z’ and A are initialized such that param2tok is invariant to permutation of the parameter vector. its unclear to me how figure 2 illustrated the fact that “transformer’s equivariance can be used if a permutation symmetry is present in the data but will not be enforced if not”.