Expanding the HAISP Dataset: AI's Impact on Songwriting Across Two Contests

Lidia Morris; Michele Newman; Xinya Tang; Renee Singh; Marcel Vélez Vásquez; Rebecca Leger; Jin Ha Lee

Abstract:

As artificial intelligence (AI) continues to shape creative practices, understanding its role in human-AI songwriting remains crucial. This paper expands the Human-AI Songwriting Processes (HAISP) dataset by incorporating data from the 2024 AI Song Contest, building upon the original 2023 dataset. By analyzing new submissions, we provide further insights into AI's evolving impact on songwriting workflows, creative decision-making, and control. A comparative study of AI tool usage and participant strategies between the 2023 and 2024 contests reveals shifts in collaboration patterns and tool effectiveness. Additionally, we assess the differences between general-purpose AI systems and personalized, fine-tuned tools, highlighting their impact on creative agency. Our findings offer key design implications for AI-assisted songwriting tools, providing actionable insights for AI developers and music practitioners seeking to enhance co-creative experiences.

Meta Review:

Q2 ( I am an expert on the topic of the paper.)

Strongly agree

Q3 ( The title and abstract reflect the content of the paper.)

Agree

Q4 (The paper discusses, cites and compares with all relevant related work.)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Disagree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Disagree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated “Strongly Agree” and “Agree” can be highlighted, but please do not penalize papers rated “Disagree” or “Strongly Disagree”. Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Disagree (Standard topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Disagree

Q15 (Please explain your assessment of reusable insights in the paper.)

There are useful insights in the paper, but they would be more reusable if the connection between the results and design principles was more clear.

Q16 ( Write ONE line (in your own words) with the main take-home message from the paper.)

Process reports from selected 2024 AI Song Contest participants elucidates AI usage and informs how system design could further support these users.

Q17 (This paper is of award-winning quality.)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Disagree

Q20 (Overall evaluation (to be completed before the discussion phase): Please first evaluate before the discussion phase. Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines.)

Weak reject

Q21 (Main review and comments for the authors (to be completed before the discussion phase). Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

This is the meta-reviewer's independent review for paper 77: Expanding the HAISP Dataset: AI’s Impact on Songwriting Across Two AI Song Contests

This paper provides an update and extension of the Human-AI Songwriting Processes (HAISP) dataset with new data from 2024. The authors describe how the new data were collected and coded, discuss main themes emerging across the two years, and offer design principles.

This paper was well-written and easy to read throughout. The authors do a great job of providing high-level context in the Introduction which is elaborated upon in Background; and the relevance to ISMIR is clear. Overall I find this to be a solid paper on a timely topic, but am unsure of some aspects of the paper. Below I provide four main points of feedback, followed by minor feedback.

** Main feedback 1: Concerns around incremental contributions ** It's exciting to see the HAISP dataset effort continue, and the reported comparisons between 2023 and 2024 make sense given the framing of the paper. However, I am not sure whether I agree with the authors' claims of interpretable changes over time given only a 1-year sampling interval. E.g., "an evolving discourse" (p.4 line 390) and demographic trends in submissions (such as the noted increase in non-academic teams) would be more strongly reported across a longer longitudinal timeframe. Incremental research is fine of course, but as the authors mention "more concrete frameworks for future entries" (line 507), I am left wondering whether the frameworks might be better concretized across less frequent, more impactful papers that span larger time frames.

** Main feedback 2: Drawing stronger connections between design principles and results ** The design principles provided in Section 5 are all reasonable and well articulated. However, they would be reasonable even if proposed separately from this study's results, and as the paper is written it is hard to see how they arose from the present results. The authors are advised to more clearly show how these principles arose from the specific findings -- in other words, to show how the results uniquely inform these principles. For example, this could be done by referencing the derived themes or even including additional participant responses from relevant themes for each. In addition to tying the principles more clearly to the results, it could also be helpful to begin Section 5 by stating this causal relationship outright.

** Main feedback 3: Seeking more information on coding process ** I was also curious to know more about the codes and coding process. Were these codes used in the original dataset paper (it seems not), or were new codes derived this year and then applied to both years of data in a new analysis? Can some examples of codes themselves be given? Can the authors offer more detail on how the coding turned out -- is there a reliability metric or statistic that can be reported to summarize the extent of agreement before/after discussion as well as number of tiebreaks needed? Finally, how were the codes translated to the themes reported in Section 4?

** Main feedback 4: Provide more context on commonality of results ** In Section 4, there were many qualitative mentions of how common a theme was (e.g., "Many users" line 255, "teams frequently encountered" line 278, "much more" line 298) but as a reader I found it hard to translate these extents to tangible quantities. Can the authors provide more grounding to readers by providing more quantitative backing to the main results?

Minor - Line 48: Based on ref #10, is it more accurate to say HAISP was introduced in 2024? - Section 3: Suggest removing "Methodology" from the section title since Section 3.2 is more results than methodology - In the Background section, it could be useful to provide more context on the AI Song Contest, for readers who are learning about it for the first time. - In Section 3.1 I was curious at this point why Udio/Suno submissions were disqualified. The authors go into this in detail in Section 4.3 but it could be helpful to provide more context upfront (and perhaps reference back to it). - Related to above two points: The content in lines 360-365 could also go in a Background subsection on AI Song Contest and also help the reader understand the omission of Udio/Suno entries.

Q22 (Final recommendation (to be completed after the discussion phase) Please give a final recommendation after the discussion phase. In the final recommendation, please do not simply average the scores of the reviewers. Note that the number of recommendation options for reviewers is different from the number of options here. We encourage you to take a stand, and preferably avoid “weak accepts” or “weak rejects” if possible.)

Weak reject

Q23 (Meta-review and final comments for authors (to be completed after the discussion phase))

This is the meta-review for paper 77: Expanding the HAISP Dataset: AI’s Impact on Songwriting Across Two AI Song Contests. The meta-reviewer and 3 other reviewers independently reviewed the paper. While we did not achieve consensus in our individual reviews, we reached an understanding during the discussion phase regarding the direction of the overall recommendation.

All reviewers praise the paper for its clear writing, relevance to MIR, useful information, and timeliness with regard to the topic. Across the reviews, several suggestions for improving the paper also arose. First, there was uncertainty around the paper's impact -- for example, whether the contribution provides enough depth or longitudinal value. On a related point, while the design principles made sense, their novelty and connection to results were unclear. Finally, more details on the coding process, the dataset, and the longitudinal aspect of the results could have been helpful. A longitudinal contribution spanning more years could provide heightened impact; or the authors could do a deeper dive into reporting on the present methods and data -- and connect results more strongly to the design principles -- which would serve as a faster (and hence timely) contribution to a fast-moving research topic.

Review 1:

Q2 ( I am an expert on the topic of the paper.)

Strongly agree

Q3 (The title and abstract reflect the content of the paper.)

Strongly agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Strongly agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Disagree (Standard topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

The findings, and in particular the comparative analysis, make for a useful read for newcomers to ISMIR or for staying up to date.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

This paper updates the HAISP dataset with data from the 2024 AI Song Contest, and shares findings from common experiences of the participants.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Agree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Strong accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

This is a well written paper and I think the dataset and the findings will be useful for the community. I think the trends among the participants and the year-to-year differences are really interesting to see.
The design principles emerging from analyzing this dataset don't seem to break new ground. I'm not sure whether there's more to uncover here, but one interpretation of this finding could be an updated understanding about what the AI Song Contest itself is providing to the community - this year, it offered valuable insights about the changing attitudes and backgrounds of participants, but revealed less about the usability of music AI tools themselves. Maybe this is something to consider for the contest organizers as they choose their directions for the future.

Review 2:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 (The title and abstract reflect the content of the paper.)

Agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Disagree (Standard topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

The methodology of this paper is very successful, specifically:

Clearly presenting the dataset to be analyzed and quantitative statistics regarding it
Analyzing general trends (and reinforcing them with qualitative quotes from the dataset, where useful)
Taking these trends and making them into actionable guidelines for AI tools producers

It is this methodology that I think would be reusable for analyzing year 2025 in the same way, for example.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

Between 2023 and 2024 how songwriters utilized AI in the context of song-writing changed: more of those in industry were involved (versus mostly in academia in 2023) and less bespoke tools were used; however, issues regarding control, ethical transparency of AI tools, and clumsy integration of AI tools into standard workflows (e.g. DAWs) were still present.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Agree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Weak reject

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

I think this paper brings up interesting (and quantitatively derived) trends regarding use of AI in songwriting, including how it is shifting out of academia to industry (now that more tools are mature), that this proliferation is leading to more pressure regarding ease of use, ethical transparency, and flexibility, and backs all this up with quotes from real-world composers and song-writers, which I appreciated.

I also find the methodology of this paper to be successful, specifically:

Clearly presenting the dataset to be analyzed and quantitative statistics regarding it
Analyzing general trends (and reinforcing them with qualitative quotes from the dataset, where useful, which I very much appreciated)
Taking these trends and making them into actionable guidelines for AI tools producers: including interactivity and adaptivity, enhancing transparency and explainability, including the ability to include style and genre customization, and integrating tools into existing creative workflows.

The paper is easy to read and the authors clearly convey their main takeaways, summarizing them into a set of conclusive "Design Principles for Music Generation AI" (see above) which could certainly be acted upon/used as design principles for GenAI software development teams and companies.

Review 3:

Q2 ( I am an expert on the topic of the paper.)

Strongly agree

Q3 (The title and abstract reflect the content of the paper.)

Agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Disagree

Q5 (Please justify the previous choice (Required if “Strongly Disagree” or “Disagree” is chosen, otherwise write "n/a"))

The paper discusses a lot about what is desirable of an AI system to cater to human creativity but fail to cite: https://arxiv.org/abs/2304.03407

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Strongly disagree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Disagree (Standard topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

The paper discusses some guidelines for designing AI tools rooted in practice-based observation. This is very useful for designers of the tools, however, it is nothing that has not been discussed before.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

The paper extends the HAISP dataset with submissions from the 2024 AI song contest and discusses the observed differences between the participant preferences and practices across the two years.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Agree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Weak reject

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

Strengths: The HAISP dataset is extended with submissions from 2024 AI song contest, providing opportunity for the community to study the changes in the creative practices and participant behaviours about AI in various aspects of the song production process. Weaknesses: The paper could have presented in depth details about the dataset such as participant distribution in terms of skills, professional practices. They could have also listed the tools used by participants and a clear distinction of tools for different tasks. The most important results from the paper highlight increased participation from creative industry in 2024 and participants preferring assistive systems that are ethically trained. This is interesting but not interesting enough. The paper seems a bit underdeveloped. I would encourage the authors to describe the dataset in more details and present detailed and comprehensive longitudinal comparison across the years. The authors say that a journal paper is underway and I look forward to it. Lastly, authors propose guidelines for developing AI tools which again is interesting but nothing novel. Similar content has been discussed in several papers and presentations before, some of which authors fail to mention.

P1-3: Expanding the HAISP Dataset: AI's Impact on Songwriting Across Two Contests

Lidia Morris, Michele Newman, Xinya Tang, Renee Singh, Marcel Vélez Vásquez, Rebecca Leger, Jin Ha Lee

Presented In-person

4-minute short-format presentation