Interpretable Modeling of Music Aesthetics: Insights from Elastic Net and GAMs on the SongEval Dataset
Yiting Xia
Primary Subject: Early Research
Some of the required materials for this paper do not exist: Video
We fit interpretable models, including Elastic Net and Generalized Additive Models (GAMs), to investigate which audio features align with human aesthetic ratings of music. Our analysis focuses on a subset of the SongEval dataset, which includes subjective annotations across five perceptual dimensions: overall coherence, memorability, vocal naturalness, structural clarity, and overall musicality. Using summary statistics of pitch, spectral, energy, and rhythmic features, we train the models on a balanced sample of songs with high and low scores. Both models consistently identify zero crossing rate and Root Mean Square (RMS) energy as dominant predictors, followed by onset envelope and other spectral attributes. These findings provide promising, although not definitive, insights into the perceptual dimensions of music aesthetics.