P5-6: Enhancing Music Recommender Systems with Multimedia Content: A Context-Aware Approach

Oleg Lesota, Veronica Clavijo, Attia Rizwani, Markus Schedl, Bruce Ferwerda

Subjects: Open Review ; Music interfaces and services ; Personalization ; Music recommendation and playlist generation ; Applications ; Human-centered MIR ; Music videos, multimodal music systems

Presented In-person

4-minute short-format presentation

Abstract:

The evolution of the music industry has introduced multimedia elements—such as video, text, and images—into music consumption. However, current Music Recommender Systems (MRSs) remain predominantly audio-focused, requiring explicit user interaction to access additional media. This study explores the integration of multimedia content into MRSs, considering the role of contextual activities and the Uses and Gratifications (U&Gs) framework in enhancing personalization and engagement. A diary study with 26 participants over one week identified nine key activities, with Household Chores, Workout, and Focusing being the most relevant. These activities revealed novel U&Gs such as "For Preference", "For Convenience", "For Discovery", and "To Get Distracted". A subsequent user study compared a Basic Music App (audio-only) with a Modified Music App (multimedia-enhanced). Results showed that participants preferred the Modified Music App across five constructs: novelty, ease of use, usefulness, satisfaction, and intention to use. These findings suggest that multimedia-enhanced recommendations can improve user experience by aligning with activity-specific preferences. The study contributes to research on personalized MRSs and offers insights for developing context-aware, multimedia-driven recommendations.

Meta Review:

Q2 ( I am an expert on the topic of the paper.)

Disagree

Q3 ( The title and abstract reflect the content of the paper.)

Agree

Q4 (The paper discusses, cites and compares with all relevant related work.)

Disagree

Q5 ( Please justify the previous choice (Required if “Strongly Disagree” or “Disagree” is chosen, otherwise write "n/a"))

There are several related work on using contextual features in recommender systems. Some that were presented at previous ISMIR conferences include: - Hu, Y., and Ogihara, M.. "NextOne Player: A Music Recommendation System Based on User Behavior." ISMIR. Vol. 11. 2011. - Schedl, M., and Flexer, A. "Putting the User in the Center of Music Information Retrieval." ISMIR. 2012. - Vigliensoni, G., and Fujinaga, I. "Automatic Music Recommendation Systems: Do Demographic, Profiling, and Contextual Features Improve Their Performance?." ISMIR. 2016.

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Disagree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated “Strongly Agree” and “Agree” can be highlighted, but please do not penalize papers rated “Disagree” or “Strongly Disagree”. Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Disagree (Standard topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

  • The ability to combine qualitative studies (diary study) with quantitative studies (user study) provides a framework to better understand user preferences, and can be applied to other user-based studies.
  • The use of UGT in a case study of music recommender systems can identify newer motivations that user might have while interacting with them such as "preference", "convenience", "discovery", "distraction"

Q16 ( Write ONE line (in your own words) with the main take-home message from the paper.)

Users of music recommender systems might enjoy additional multimedia (such as images, videos or lyrics) while listening to music, depending on their context.

Q17 (This paper is of award-winning quality.)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Agree

Q20 (Overall evaluation (to be completed before the discussion phase): Please first evaluate before the discussion phase. Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines.)

Weak accept

Q21 (Main review and comments for the authors (to be completed before the discussion phase). Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

The paper presents a two-step case study on the effects of context awareness and multimedia content (namely video, text and image) on a music recommendation system. In the first study, the authors attempt to capture participants’ choice of the combination of an activity (according to the Uses and Gratifications Theory or UGT) and a music-related media, and the reasons for choosing such media. In the second study, the authors use the output of the previous study to inform the development of two music prototypes, one with audio-only and one with different types of media and measure different aspects of the user experience via a survey through two frameworks (ResQue, and UEQs).

Strengths: - Comprehensive two-step study of the relationship between contextual aspects (described through activities and Uses and Gratifications) and the choice of multimedia content in the user experience of a music recommender system. - Statistical analysis of the results of both studies

Weaknesses: - Repetition of content in different sections. For example, the activities in the Diary Study described both in Section 4.1 and Section 5.1. - Missing results in Section 4.1.1. It would have been very useful to see the results of the MANOVA tests instead of cherry picking a couple of them, especially since there is still space left in the paper. - Incomplete details in the User Study. Things like the participant distribution (section 4.2), the specific questions asked to the participants (section 3.2), how did the listeners interact with the app, especially the modified app? Did they need to switch between media types? Were they able to listen to any of the songs? 10 minute sessions sound pretty short. Why and how was that decided? What activities were analyzed in the UEQ-S findings?

Other aspects: - The authors claim in Section 2 that: “Despite these added functionalities, platforms generally leave it to listeners to initiate any deeper engagement, such as clicking on the lyrics tab or opting to watch a video. Consequently, the service does not dynamically recommend multimedia formats that might enhance an individual’s specific context or motivation at the moment. ”. In order to be able to dynamically recommend multimedia, music streaming services need to capture the listener’s intent in some way or another. If the focus is on a music streaming app, these streaming services can use information such as the time of day, location and whether the listener is actively interacting with the app vs. listening in the background, which are aspects that the paper did not address (although it was mentioned in the Limitations section). Location information would also pose some legal hurdles. Later in the discussion the authors also claim “These responses reinforce the notion that while additional media can be helpful or entertaining, contextual appropriateness remains vital, and users often want the freedom to choose how much visual or textual content accompanies the audio.”

Comments and typos: - Section 1, 2nd paragraph. Since MTV’s launch in 1981 → Since the launch of MTV in 1981. - Section 1, 2nd paragraph. Links to the music streaming - services are unnecessary. - Section 2, 1st paragraph. I would not necessarily say that there is a shift in Music recsys research, but more of an increase or emphasis. - Section 3.2.1. Did any of the participants in the user study overlap with the diary study? - Section 4.2.1 - 1. What were the idle chores in Household chores? - Section 4.2.1 - 2. What about the findings in the relaxing activities? - Remove “Citeseer” from Reference [25].

Q22 (Final recommendation (to be completed after the discussion phase) Please give a final recommendation after the discussion phase. In the final recommendation, please do not simply average the scores of the reviewers. Note that the number of recommendation options for reviewers is different from the number of options here. We encourage you to take a stand, and preferably avoid “weak accepts” or “weak rejects” if possible.)

Weak accept

Q23 (Meta-review and final comments for authors (to be completed after the discussion phase))

This is a meta-review. The reviewers have provided detailed reviews and I encourage the authors to carefully read them and address any comments, questions or suggestions. Below is a summarization of the feedback provided by the reviews.

The reviewers agreed on the following strengths of the paper: 1) Well motivated and robust methodology, grounded in sociological theories (UGT). 2) Clear and well organized (except for some repetitions in Section 5, and dispersion of future work across several sections (Reviewer #3)) 3) Authors thoughtfully recognize the limitations of their work (Reviewer #2)

The reviewers also pointed out a list of weaknesses, including: 1) Lack of details in the Diary study results section, specifically subsection 4.1.1 about the co-occurrence matrix of the activities and uses and gratifications (Reviewer #2). 2) Lack of details as well in the User study, specifically the demographic details (Reviewer #1), the task description and questions asked to the participants, how they interacted with the modified app and whether they were able to listen to the songs (Reviewer #3). The design choices for the music apps were also not described (Reviewer #3). 3) Repetition of content, especially the activities in the Diary Study described both in Section 4.1 and Section 5.1. By removing duplicate content, authors could address the other weaknesses by adding more details 4) Gender disparity in the User study might lead to bias (Reviewer #2) 5) Ethical considerations not addressed, both ethical approval and ethical implications of the work's suggestions (Reviewers #1 & #3). 6) Some incongruence in the author’s claims. On one hand, they state the following: “...Despite these added functionalities, platforms generally leave it to listeners to initiate any deeper engagement, such as clicking on the lyrics tab or opting to watch a video. Consequently, the service does not dynamically recommend multimedia formats that might enhance an individual’s specific context or motivation at the moment.”

On the other hand, they also recognize that: “Open-ended responses illustrated the desire for control over media formats, with participants expressing interest in turning off extra content if it became distracting or did not fit their ongoing activity. These responses reinforce the notion that while additional media can be helpful or entertaining, contextual appropriateness remains vital, and users often want the freedom to choose how much visual or textual content accompanies the audio.”

Streaming services already provide these additional media formats, and users are already able to choose which one to use, when they feel like it is appropriate. So it seems like the only benefit here is that these streaming services could anticipate the intent of the user and show the additional media when needed. In other words, instead of opting in, users would opt out if they did not feel it is appropriate. Furthermore, predicting the user intent requires additional signals, some of which can be “easily” obtained, such as time or whether users are actively using the device or leaning back, but there are others, such as the location of the listener (or using camera features) that raise concerns about privacy and tracking, as mentioned by Reviewer #3 as well. This “tension”, this two-sides-of-a-coin between human agency and surveillance is an important and common topic in Recommender systems user research and authors should at least provide some reflection on it, perhaps in the ethical considerations section at the end of the paper.

After a discussion among the reviewers, we came to the conclusion that this paper, despite the number of weaknesses that were reflected in the reviews, and summarized/expanded on above, deserves to be accepted to the conference. Accepted with the condition that the authors carefully read and follow all the feedback provided by the reviewers, and make the required corrections/modifications to improve the manuscript.

Review 1:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 (The title and abstract reflect the content of the paper.)

Agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Strongly agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Agree (Novel topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

The presented insights can be helpful in improving recommender systems, in the music field but not only.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

Different activities and motivation may drive music listeners in exploring different kinds of music-related media.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Agree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Weak accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

Strengths 1. Clear structure and writing 2. Thoughtful methodology 3. Theoretical framing Weaknesses 1. Privacy and ethical considerations not addressed 2. Redundant or unnecessary Information 3. Lack of sample demographic details

This paper presents a two-stage study, consisting of a diary study followed by a user study, aimed at understanding the relationship between engagement with music-related media and contextual activities. The study is grounded in the Uses and Gratifications Theory (UGT), which is used to identify the motivations behind why certain activities or media types may be preferred in specific contexts. The second phase involves a controlled experiment in which participants interact with two music application prototypes, one audio-only and one enriched with multimedia information. The results suggest that different activities can influence the type of content users engage with.

The paper is well-written, provides sufficient detail to understand each step of the experimental procedure, and is structured in a way that is easy to follow. I particularly appreciate the decision to build the user study upon the initial diary study, which adds coherence and depth to the research design. While the findings are exploratory, I believe the paper offers a nuanced understanding of the phenomenon under investigation and am therefore inclined to recommend its acceptance. Below, I outline a few minor comments:

  • Section 1, 2nd paragraph: The URLs of the platforms could be omitted or moved to a footnote.
  • Section 2, 3rd paragraph: The list of streaming services (“Spotify, Apple Music, and YouTube Music”) may be unnecessary, as similar examples are already provided in the previous paragraph.
  • Ethical approval: Please specify whether the study received IRB or ethical board approval.
  • Section 3.1.1, 1st paragraph: In addition to reporting gender (“19 female, 7 male”), please include the average age of participants.
  • Figure 1: Images (c) and (d) appear blurred; consider improving their resolution.
  • Section 5.1: There is no need to repeat the list of activities and motivations, as they are already presented in the previous section.
  • Discussion: I would be interested to see a reflection on the implications of context-aware recommender systems. While they may offer more relevant suggestions, they also raise concerns regarding privacy, data tracking, and potential misuse of user surveillance, even within music streaming platforms.
Review 2:

Q2 ( I am an expert on the topic of the paper.)

Disagree

Q3 (The title and abstract reflect the content of the paper.)

Strongly agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Strongly agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Strongly agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Strongly agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Strongly agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Strongly agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Agree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Agree (Novel topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Strongly agree

Q15 (Please explain your assessment of reusable insights in the paper.)

  1. Users prefer different media formats (e.g., text, video, images) depending on the activity they are engaged in (e.g., relaxing vs. working out). This insight can inform the design of context-aware multimedia recommendation strategies in broader content platforms.
  2. The combination of qualitative insights from the Diary Study with quantitative measures from the User Study, grounded in Uses and Gratifications Theory (UGT), underscores the value of hybrid research methods. This approach can serve as a reusable framework for analyzing user needs in other content-rich recommender systems.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

Music listening experience can be significantly improved by aligning supplementary content (e.g., videos, text, images) with users' activities and preferences, as long as they offer flexibility and allow users to control or opt out of additional media content.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Agree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Strong accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

Brief Summary In this paper, the authors use Uses and Gratifications Theory (UGT) to explore users’ motivations for engaging with various music-related media formats (e.g., text, video, images) and how different contextual activities (such as doing household chores, working out, or relaxing) influence these choices. To investigate this, they conducted two studies. The first, a diary study, examined how individuals’ U&G motivations align with their daily activities. Building on these findings, the second user study assessed the impact of recommending supplementary media formats (video, text, or images) tailored to users’ context and motivations. The results indicate that context-aware, multimedia-enhanced music recommendations can significantly enhance user experience, as long as they offer flexibility and allow users to control or opt out of additional media content.

Strengths 1. The research is grounded in a compelling motivation. 2. The methodology is robust, framed effectively through the lens of Uses and Gratifications Theory (UGT). 3. The manuscript is well-organized and clearly written. 4. The authors thoughtfully acknowledge the limitations of their study and propose several avenues for future exploration that build on their findings.

Minor concerns 1. The authors could include the co-occurrence matrix of the nine activities and thirteen uses and gratifications (U\&G) in Section 4.1 to provide a more detailed view of the distribution of Diary Study entries. 2. Similarly, the authors could include the distribution of activities for the 59 valid data points in the User Study. 3. The Diary Study had a predominantly female participant group (19 out of 26), but the gender distribution for the 63 participants in the User Study is not reported. Was the male/female distribution similar across both studies? If not, could this discrepancy introduce bias into the results?

Review 3:

Q2 ( I am an expert on the topic of the paper.)

Agree

Q3 (The title and abstract reflect the content of the paper.)

Agree

Q4 (The paper discusses, cites and compares with all relevant related work)

Agree

Q6 (Readability and paper organization: The writing and language are clear and structured in a logical manner.)

Strongly agree

Q7 (The paper adheres to ISMIR 2025 submission guidelines (uses the ISMIR 2025 template, has at most 6 pages of technical content followed by “n” pages of references or ethical considerations, references are well formatted). If you selected “No”, please explain the issue in your comments.)

Yes

Q8 (Relevance of the topic to ISMIR: The topic of the paper is relevant to the ISMIR community. Note that submissions of novel music-related topics, tasks, and applications are highly encouraged. If you think that the paper has merit but does not exactly match the topics of ISMIR, please do not simply reject the paper but instead communicate this to the Program Committee Chairs. Please do not penalize the paper when the proposed method can also be applied to non-music domains if it is shown to be useful in music domains.)

Agree

Q9 (Scholarly/scientific quality: The content is scientifically correct.)

Agree

Q11 (Novelty of the paper: The paper provides novel methods, applications, findings or results. Please do not narrowly view "novelty" as only new methods or theories. Papers proposing novel musical applications of existing methods from other research fields are considered novel at ISMIR conferences.)

Agree

Q12 (The paper provides all the necessary details or material to reproduce the results described in the paper. Keep in mind that ISMIR respects the diversity of academic disciplines, backgrounds, and approaches. Although ISMIR has a tradition of publishing open datasets and open-source projects to enhance the scientific reproducibility, ISMIR accepts submissions using proprietary datasets and implementations that are not sharable. Please do not simply reject the paper when proprietary datasets or implementations are used.)

Disagree

Q13 (Pioneering proposals: This paper proposes a novel topic, task or application. Since this is intended to encourage brave new ideas and challenges, papers rated "Strongly Agree" and "Agree" can be highlighted, but please do not penalize papers rated "Disagree" or "Strongly Disagree". Keep in mind that it is often difficult to provide baseline comparisons for novel topics, tasks, or applications. If you think that the novelty is high but the evaluation is weak, please do not simply reject the paper but carefully assess the value of the paper for the community.)

Disagree (Standard topic, task, or application)

Q14 (Reusable insights: The paper provides reusable insights (i.e. the capacity to gain an accurate and deep understanding). Such insights may go beyond the scope of the paper, domain or application, in order to build up consistent knowledge across the MIR community.)

Agree

Q15 (Please explain your assessment of reusable insights in the paper.)

The methodology as applied in the paper could, with some adaptation, be applied to explore other types of user context. The specific insights may also be used to inform platforms that employ music recommender systems in how and when to best incorporate media types beyond audio.

Q16 (Write ONE line (in your own words) with the main take-home message from the paper.)

In this paper, the authors give insight into which type of media users prefer during which type of activities, and show that as the activity type impacts user preferences on multimedia content, it is important to take it into account in a music recommendation context.

Q17 (Would you recommend this paper for an award?)

No

Q19 (Potential to generate discourse: The paper will generate discourse at the ISMIR conference or have a large influence/impact on the future of the ISMIR community.)

Disagree

Q20 (Overall evaluation: Keep in mind that minor flaws can be corrected, and should not be a reason to reject a paper. Please familiarize yourself with the reviewer guidelines at https://ismir.net/reviewer-guidelines)

Weak accept

Q21 (Main review and comments for the authors. Please summarize strengths and weaknesses of the paper. It is essential that you justify the reason for the overall evaluation score in detail. Keep in mind that belittling or sarcastic comments are not appropriate.)

In this paper, the authors explore the impact that user activities have on their preference for multimedia content types in a music recommender system context. They conducted two user studies: 1) a diary study in which users wrote down their music-listening behavior during one week, and noted their activity, type of music-related media consumed, and reasons why, and 2) a user study in which two music applications were evaluated, one with and one without additional media types beyond audio. The authors (unsurprisingly) find that there is indeed a connection between user activities and their preference for additional media types.

Strong points of this work are the following: - The motivation for this work is presented very clearly in the introduction, and previous works are outlined so that the novelty of this work is evident. The insights help progress the research field of music recommendation and personalization by taking into account contextual factors (activities) and media types (audio, image, text, video). Therefore, it is a good fit for the ISMIR conference. - The two user studies fit the research questions and nicely complement each other. Results are clearly described and give some new insights into user preferences. - The paper is generally well-written and easily followable, with a clear structure and very few grammatical errors.

Still, there is room for improvement in the following aspects, with my main point of criticism concerning the clarity and presentation of the method.

Method - While the diary study procedure is clear enough, for the second user study, information is missing that would be needed for full reproducibility. For example, the song selection process is only described at a very high level, and the final song selection is not shared. Moreover, the design choices for the two music apps are not described. How could users interact with the recommendations and media types? Were the recommended songs the same in both apps and how were they shown (Figure 1 prominently shows recommended artists but not different songs)? Could the participants listen to the songs, and why (not)? Also, the task description users received and the exact questions are not shared (were they 100% the same as in ref 26 and 27?), and the code for the apps is not shared. It is therefore not clear how participants were encouraged to consider the activity context, and how they were invited to share feedback after interacting with the apps. As these details are not available, it is difficult to gauge the impact of the method on the results. - Even though this is a user study, the authors do not mention any ethical considerations or approval of an ethical review board.

Results - Some of the screenshots in Figure 1 are of low quality. - Section 3.2 mentions that participants were informed their task was to assess alignment between content and scenario, but in Section 4.2.2, some points (e.g., accuracy and novelty) do seem to be focused more on the evaluation of the song recommendations. This would be highly influenced by song preference which was not accounted for in the study design, and therefore I wonder how useful these insights are. - If a selection criterion is that participants needed to listen to music every day, and they had to note all music consumption episodes, what could explain the discrepancy with some participants having only recorded 3 diary entries total?

Discussion & conclusion - The majority of Section 5 consists of additional insights rather than interpretation of insights, and would therefore fit better in the Results section. Perhaps more discussion could be dedicated to why users might consume more music in certain situations or consume it differently, and how this can be better accounted for in the streaming services they use. - It would be worthwhile to mention the limitation of the second user study that the interaction environment was artificial, and participants could (and likely would) behave and/or respond differently in a real-life setting. Impact of the user study design choices on the results is insufficiently described.

Overall - There is some repetition across sections, like repetition of the method in the results section. The manuscript could be condensed somewhat. - Future work is now spread out over the last 3 sections. It would be better to collect all of it in one place. - The first sentence of the last page’s right column is incomplete.