Abstract:

In recent years, the guitar has received increased attention from the music information retrieval (MIR) community driven by the challenges posed by its diverse playing techniques and sonic characteristics. Mainly fueled by deep learning approaches, progress has been limited by the scarcity and limited annotations of datasets. To address this, we present the Guitar On Audio and Tablatures (GOAT) dataset, comprising 5.9 hours of unique high-quality direct input audio recordings of electric guitars from a variety of different guitars and players. We also present an effective data augmentation strategy using guitar amplifiers which delivers near-unlimited tonal variety, of which we provide a starting 29.5 hours of audio. Each recording is annotated using guitar tablatures, a guitar-specific symbolic format supporting string and fret numbers, as well as numerous playing techniques. For this we utilise both the Guitar Pro format, a software for tablature playback and editing, and a text-like token encoding. Furthermore, we present competitive results using GOAT for MIDI transcription and preliminary results for a novel approach to automatic guitar tablature transcription. We hope that GOAT opens up the possibilities to train novel models on a wide variety of guitar-related MIR tasks, from synthesis to transcription to playing technique detection.

Meta Review:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 ( The title and abstract reflect the content of the paper.)

Agree

Q4 (The paper discusses, cites and compares with all relevant related work.)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Strongly disagree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Disagree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated “Strongly Agree” and “Agree” can be highlighted, but please do not penalize papers rated “Disagree” or “Strongly Disagree”. Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Agree (Novel topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Disagree

Q15 (Please explain your assessment of reusable insights in the paper.)

As this is a dataset paper, the focus right now is on dataset documentation rather than offering broader reusable insights; I would expect for these to emerge later when the dataset will be used.

Q16 ( Write ONE line (in your own words) with the main take-home message from the paper.)

GOAT offers a new, rich dataset of guitar audio, associated tablatures, and various augmentations that are realistic for guitar players (such as different amplifier renderings)

Q17 (This paper is of award-winning quality.)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Disagree

Q20 (Overall evaluation (to be completed before the discussion phase): Please first evaluate before the discussion phase. Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines.)

Weak accept

Q21 (Main review and comments for the authors (to be completed before the discussion phase). Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

This dataset paper offers new, rich data on guitar playing, including guitar audio by different players, associated aligned tablature, and different amplifier renderings. I appreciate the care that seems to have gone into the creation of the dataset, and the way in which the authors really seem to consider actual playing considerations specific to the guitar.

While I think this article would fit well at ISMIR and benefit the ISMIR community, I do have a few questions or remarks. Generally, as also acknowledged in the ethics statement and in the paper itself, use permission (and possible copyright/licensing issues) may be a possible problem for data like this. As mitigation, the authors indicate data only will be released for research purposes, and metadata removal/anonymization was performed as described in Section 3.2. I wonder to what extent that would be sufficient to avoid identifiability - and relating to this, what informed the choice of data in the first place.

The choice of data/songs is less clearly documented in the paper, which I again assume is for protection against possible copyright challenges, but some further justification would be welcome - why a cover of an existing piece, rather than e.g. new material that is not yet under protection, or test samples that represent playing in relation to tablature but would have less possible controversy, such as arpeggios, scales or other exercise fragments? What would generally be considered a comprehensive dataset in the eyes of the creator, and to what extent is GOAT meeting this in its choice of material?

Q22 (Final recommendation (to be completed after the discussion phase) Please give a final recommendation after the discussion phase. In the final recommendation, please do not simply average the scores of the reviewers. Note that the number of recommendation options for reviewers is different from the number of options here. We encourage you to take a stand, and preferably avoid “weak accepts” or “weak rejects” if possible.)

Accept

Q23 (Meta-review and final comments for authors (to be completed after the discussion phase))

This paper presents a new dataset that will clearly have value to the ISMIR community. While the reviews were not unanimous in their verdicts, in the discussion phase, no strong objections were raised against acceptance, but rather, the positive contributions of the paper were further emphasized. As such I consider it well above the acceptance bar, and recommend inclusion at ISMIR.

Review 1:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 (The title and abstract reflect the content of the paper.)

Strongly agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Disagree (Standard topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Disagree

Q15 (Please explain your assessment of reusable insights in the paper.)

The paper is a "dataset paper", whose main contribution lies in a novel dataset for an established MIR task (guitar transcription)

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

The GOAT dataset uniquely combines real guitar audio with digital tablature annotations, including effective data augmentation strategies and promising results in guitar transcription.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Disagree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Strong accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

This is an interesting paper on a novel guitar transcription dataset, which provides both MIDI and Tab representations and is therefore useful for several task. The variety of different file formats and the active attempt to ensure compatability with other research as well as the critical discussion in 4.3. is appreciated. I like the hypothesis-driven experimental design. The paper is well structured and easy to read.

I have the following comments / suggestions:

Introduction - "(3) an evaluation of results" -> evaluation of methods - preliminary results -> how strong of a contribution is this, explain why preliminary (maybe this term can be avoided to make a stronger point even if the method is improved later)

Sec 2.1. - the list "[12][5][6][18][19]" would benefit from a short list of general approaches for tab generation, to be a bit more specific

Sec 3.1. - "community-created tablatures" -> how are copyright concerns adressed?, same question for 3.2., first sentence

Sec 3.2. - give some insight which method for fine-aligning the MIDI notes to the performance was used - it is not clear, why the tuning was added as additional annotation

Sec. 3.3. - to what detail are re-amping model parameters stored as well?

Fig. 3 - change y-axis to logarithmic scale (left three subplots) - in the right subplot -> add the number of no playing techniques as reference / for comparison

Sec 4.1. - you might consider adding subsubsections for better structuring

Sec. 5.1 - "Following [4] [2], we finetune ..." > gie some more details about network architecture - "its zero-shot learning capabilities" > I think this needs some justification, why the same task (transcription) but on different instrument timbres is really zero-shot learning

Sec. 5.2.1. - "The transcription results ... do show some promise" > which metric values would you consider satisfactory for the task? - "...it is possible for the model to simply learn ..." > did you check, was that the case? - "The model tends to overfit ..." > can you share some quantitative evidence for this?

Sec. 8 - "...we intend to make the dataset ... upon request" > is this allowed? Would you not need consent from the original artists / their labels?

References - revise for consistent labeling of conference names ([5], [6] "The ...") - [8] has been published at ICASSP 2023 and should replace the arxiv pre-print - [15, 16] add page range - [17] -> give full name for PMLR - remove publisher for IEEE

Review 2:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 (The title and abstract reflect the content of the paper.)

Strongly agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Strongly agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Strongly agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Strongly agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Strongly agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Strongly agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Strongly agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Strongly Agree (Very novel topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Strongly agree

Q15 (Please explain your assessment of reusable insights in the paper.)

The GOAT dataset's design decisions (e.g., raw DI recording with massive amplifier augmentation) and the attempt at a novel audio-to-DadaGP transcription task offer highly reusable insights

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

GOAT is a large, diverse, and uniquely annotated guitar dataset that enables a range of new MIR tasks by pairing real electric guitar recordings with detailed tablatures, amplifiable across endless tonal variations.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Strongly agree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Strong accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

  • GOAT fills a major gap in guitar MIR by offering real-world, expressive annotations beyond basic MIDI.
  • Careful collection, alignment, tone augmentation, and documentation practices make the dataset credible.
  • Proposing DadaGP-based audio-to-text transcription expands the MIR frontier, aligning with current trends in large language/audio models.
  • Clear structure, accessible explanations, thorough statistical analysis of the dataset.

Minor Suggestions: * Since GOAT primarily uses covers and popular songs, it would be helpful to provide a short paragraph or table summarizing the genre/style diversity (e.g., proportion of rock, metal, pop, etc.) to give future users a clearer idea of dataset biases. * Even a tiny preliminary human evaluation (e.g., 5–10 samples rated for tablature transcription quality) could have illustrated the qualitative potential of the Whisper fine-tuning approach, alongside WER scores. * Showing a few sample outputs of the DadaGP transcription (both successful and failed examples) would help readers better understand the kinds of errors and structure the model is learning.

Review 3:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 (The title and abstract reflect the content of the paper.)

Agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

disagree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Disagree (Standard topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

The authors have already anonymously given the reamping code and they are planning to release trained models and training code.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

A well-documented dataset of paired guitar audio and tablature is introduced, along with an augmentation pipeline and baseline experiments for transcription tasks.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Disagree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Weak reject

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

This paper introduces GOAT, a large dataset of paired electric guitar audio and tablature annotations, created using real performances and extended through a comprehensive amplifier-based data augmentation pipeline. It provides real DI (direct input) recordings with symbolic Guitar Pro annotations and offers experiments for guitar MIDI transcription and automatic guitar tablature transcription (AGTT) using Whisper.

Strengths: The introductory sections and dataset description are clearly written with detail, statistics, and metadata provided.

The dataset provides annotation richness, the inclusion of expressive guitar techniques (e.g., bends, mutes, legato) in tablatures, is a valuable addition over typical MIDI-only datasets.

Suggestions:

The real value lies in the dataset creation tool and pipeline, not necessarily the data itself. This distinction should be emphasized. A lot of existing datasets (e.g., GuitarSet, GAPS) could be transformed similarly. For all the midi transcription experiments it would be nice to also include ablation studies on the pipeline applied to other datasets, assessing actual performance improvements from this synthetic tonal variety.

For the MIDI transcription experiments, while the test set from GuitarSet is fair, I would suggest comparing with a separately created test set (recorded with real amps and effects, if possible), or at least processed with different effects and not the same pipeline. Even with proper splitting, having test data created using the same synthetic procedure as the training data might bias results, as the model could have learned artefacts specific to that pipeline. For example, in the case of AMP-XL, you might have seen more meaningful improvements if it had been tested on real data or data created using a different amping plugin.

The AGTT task is underspecified. The audio-to-text formulation of AGTT (automatic guitar tablature transcription) is novel, but it’s not well-explained what this task is. If this is a first introduction of AGTT as audio-to-text, more explanation and justification is needed. For example, what makes DadaGP suited to Whisper? How are timing, rhythm, or note grouping handled in tokenization? What is the ultimate downstream application? This approach may confuse readers unfamiliar with Whisper, token-based encodings, or tablature formats.