When Voices Interleave: Timing Deviations in Six Performances of Telemann's Fantasias for Solo Flute

Patrice Thibaud; Mathieu Giraud; Yann Teytaut

Abstract:

Performers convey musical meaning not only through pitch and dynamics but also through micro-timing deviations. This study examines performance analysis and timing in Georg Philipp Telemann’s 12 Fantasias for Solo Flute, focusing on how musical elements, such as implied polyphony, onset positions, and meter, influence musical performance. We release a corpus with annotations on interleaved voices gathering 11 musicological sources. We first evaluated how simple rules may detect such interleaved voices from the scores. We then analyzed six complete recordings of the fantasias, comparing their timing deviations against a metronomic interpretation. Results reveal significant timing deviations influenced not only by note position within rhythmic groupings, but also by the presence of interleaved melodic voices, in particular when these interleaved voices are notated with opposing stems.

Meta Review:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 ( The title and abstract reflect the content of the paper.)

Strongly agree

Q4 (The paper discusses, cites and compares with all relevant related work.)

Strongly agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Strongly agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Strongly agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated “Strongly Agree” and “Agree” can be highlighted, but please do not penalize papers rated “Disagree” or “Strongly Disagree”. Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Agree (Novel topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

Timing deviations in flutist playing are influenced by the interleaving of voices. It is unclear how reusable this insight is as the analysis is based on a single piece of a particular time period and performance practice.

Q16 ( Write ONE line (in your own words) with the main take-home message from the paper.)

Timing deviations in flutist playing are influenced by the interleaving of voices in the Teleman fantasia for solo flute.

Q17 (This paper is of award-winning quality.)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Disagree

Q20 (Overall evaluation (to be completed before the discussion phase): Please first evaluate before the discussion phase. Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines.)

Weak accept

Q21 (Main review and comments for the authors (to be completed before the discussion phase). Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

The authors analyze micro-timing deviations among different flute players in the Teleman fantasia for solo flute. The work is interesting but is leaning more towards musicology with some limited use of computational techniques. It is definitely within the scope of ISMIR but there are other venues that would be more appropriate.

The writing and placing the work in context is good. Some aspects could be improved. For example the alignment procedure needs to be described in more detail as there is not enough information for reproducibility.

The musicological sources and how they were used need to be clarified.

The approach is interesting and there is some novelty in the voice rule-based heuristic.

The main weakness is that this is limited to one particular piece and the conclusions could be specific to baroque performance practice. One of the advantages of computational approaches is that they can scale beyond the analysis of limited repertoire typically done in musicology but in this case this has not happened.

I think the paper has value for the ISMIR community especially musicologists looking into performance analysis but it is not a strong contribution of more general interest.

Q22 (Final recommendation (to be completed after the discussion phase) Please give a final recommendation after the discussion phase. In the final recommendation, please do not simply average the scores of the reviewers. Note that the number of recommendation options for reviewers is different from the number of options here. We encourage you to take a stand, and preferably avoid “weak accepts” or “weak rejects” if possible.)

Weak accept

Q23 (Meta-review and final comments for authors (to be completed after the discussion phase))

There is considerable agreement among the reviewers that this is an interesting paper connecting audio analysis, performance practice, and symbolic information. The idea is interesting and well executed and the paper is well written. The main criticisms arise from the limited analysis making it unclear if the proposed methodology could be used more generally as well as various other smaller criticisms provided by the reviewers.

Review 1:

Q2 ( I am an expert on the topic of the paper.)

Disagree

Q3 (The title and abstract reflect the content of the paper.)

Strongly agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

disagree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Strongly agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Disagree (Standard topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

The dataset is nice to have, but the analysis regarding the IOI is somewhat specific to the Fantasia and the performers

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

Flutists execute the IOI differently depending on the implied voicing

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Disagree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Weak accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

This paper presents an analysis of unaccompanied flute recordings, focusing on the relationship between timing and the voicing inside the implied polyphony. The paper analyzes Teleman's 12 Fantasias played by six flautists. The paper first presents a rule-based method for detecting the change of voice, incorporating a new feature: alternation in intervals. The paper uses a ground-truth voice annotation to analyze the performances. It confirms the practice of notes inegales for groups of notes and differences in playing when playing different voices.

The work is interesting and valuable for computationally understanding performance practices. Having the dataset available for other researchers to analyze is also nice. The analysis of the paper, however, raises more questions than it answers, due to the dataset focusing exclusively on Teleman's Fantasia and the analysis being done at the performer level. This first of all makes it difficult to say whether the results arise due to Baroque playing practices in general or are specific to the Fantasia; since implied voicing is a compositional technique used outside of unaccompanied pieces, it might make sense to incorporate other Baroque pieces (with accompaniments). Second, the results presented are also specific to the flautist. How much of the results are general performative practice of the Baroque flute as opposed to artist-specific? Which of these results are similar between each artist and different?

The dataset seems important for researchers interested in Baroque flute playing, but the analysis is a bit lacking; therefore, I would suggest a minor accept of the paper.

Some other comments. I am not an expert in Baroque playing practices, but I would have liked to see more musical treatment of section 5.2, and uncover what kind of different IOI strategies can be taken to express change of voicings. That would have made the paper more valuable to the music performance community. I say this because Baroque practices, especially after the HIP movement, have evolved by interpreting past treaties (e.g., that of Quantz for flute), so there are many unwritten performative practices that we can expect performers to abide by. For example, the results in 5.1 are somewhat "expected" given many treatises dealing with the note inegales to a great extent. On the other hand, Section 5.2 in my opinion is the gem of using computational analysis, as there seem to be only a few treatises on executing voicings. This means it is up to the performer to devise a way to deliver it effectively, meaning there is a greater expressive freedom to make it work musically. I would have liked to see more IOI-based analysis on similar strategies taken by the flautists, for example, by clustering or categorizing some of the IOI patterns.

Review 2:

Q2 ( I am an expert on the topic of the paper.)

Disagree

Q3 (The title and abstract reflect the content of the paper.)

Strongly agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Strongly agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Disagree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Agree (Novel topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

The authors investigate how interleaved voices in Telemann's 12 Fantasias for Solo Flute influence micro-timing deviations in performance. They create a dataset by annotating interleaved voices from multiple musicological sources and align six audio recordings to symbolic scores. They then measure ∆IOI (inter-onset interval deviations) and ∆o (onset shifts) relative to a metronomic baseline, revealing significant and non-random expressive timing patterns shaped by structural and voice-related factors. Notably, performers tend to emphasize notes when transitioning between implied voices. The study also extends existing rule-based symbolic features to detect voice interleaving.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

The study investigates micro-timing deviations in performances of Telemann's 12 Fantasias for Solo Flute

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Disagree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Weak accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

Strengths The paper presents an original study focused on Telemann’s 12 Fantasias for Solo Flute, a repertoire rarely explored computationally within the MIR community. The investigation of micro-timing deviations in relation to implied polyphony (interleaved voices) is musically meaningful and technically underexplored. The authors compile a symbolic corpus annotated with implied polyphony from 11 scholarly sources.

The paper builds upon prior theoretical work and contributes additional symbolic features for detecting voice interleaving.

The authors release their dataset, alignment outputs, and annotations under open licenses, and provide an interactive Streamlit web application for browsing scores, performances, and timing metrics. This supports reproducibility and encourages further research. Weaknesses

Alignment procedure lacks clarity and evaluation: The description of the alignment method (Section 4.3) is incomplete. For instance, in line 231, the frame size for audio analysis is unspecified. Given that the entire ∆IOI/∆o-based analysis depends on these alignments, the absence of alignment accuracy evaluation is a major concern. The paper could benefit from basic alignment diagnostics (e.g., confidence scores, visual inspection tools, or comparison to manually annotated excerpts).

No handling of rests and ornamentation in alignment: The alignment process does not account for ornamentation or rests, even though these are prevalent in Baroque performance and can lead to onset mismatches. This raises concerns about the validity of timing deviation metrics, especially in slow or ornamented passages. While DTW is a standard and reasonable alignment approach for monophonic music, the paper would benefit from a broader discussion of alternative alignment tools. For instance, SyncToolbox

Insufficient detail on musicological resources: Although the paper claims to incorporate annotations from 11 musicological sources, it does not adequately describe: How these sources were selected and weighted.

Whether conflicting annotations were reconciled.

How movement boundaries, tempo indications, or voice labels were standardized. More transparency is needed here, especially since these annotations form the basis for the voice detection and timing analysis.

Formatting and minor presentation issues: There are a few formatting and typographical issues that should be corrected:

Line 227: “alignment” is misspelled.

Line 257: unmatched parenthesis (“[”) causes confusion in formula readability.

Review 3:

Q2 ( I am an expert on the topic of the paper.)

Disagree

Q3 (The title and abstract reflect the content of the paper.)

Strongly agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Strongly agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Strongly agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Disagree (Standard topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

This paper clearly presents its methods and openly distributes the data, code, and a web application, which nicely enhance the reusability of the work. The proposed methodology for measuring timing deviations can also be applied to or tested on other datasets to facilitate the development of performance analysis techniques.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

This paper investigates micro-timing deviations in solo flute performances by constructing a dataset, proposing rule-based methods, and analyzing the data accordingly.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Disagree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Weak accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

This paper analyzes the micro-timing deviations in solo flute performances, investigates their relationship with musical elements such as implied polyphony and interleaved voices, and assesses the consistency across performers. To achieve this, the authors build a corpus with annotations and propose rule-based methods based on Davis's work [32]. The ideas and methods are well presented, and the figures and tables are easy to understand. Moreover, the web application and embedded audio links in the paper are also valuable resources.

The main reasons I did not give this paper a strong accept are as follows:

I am not very familiar with flute music or performance analysis. While I learned some solid methods for analyzing micro-timing from this paper, I am not able to judge how novel, robust, or limited they are.

Similarly, the statistical analyses and results (e.g., Tables 5 and 6) make sense to me, but I could not see a broader picture or clear objective arising from them—perhaps due to my limited background in performance analysis. For instance, after identifying consistency or inconsistency across performers, what motivates further investigation? Would the observed patterns or conclusions hold with a larger pool of performers?

P3-14: When Voices Interleave: Timing Deviations in Six Performances of Telemann's Fantasias for Solo Flute

Patrice Thibaud, Mathieu Giraud, Yann Teytaut

Presented In-person

4-minute short-format presentation