Generating Video Soundtracks with Contextual Visual Thumbnails
Mina Huh, Ailie Fraser, Dingzeyu Li, Mira Dontcheva, Bryan Wang
Primary Subject: Early Research
Selecting a soundtrack is a critical step in video editing. However, evaluating music is a slow, sequential process. Creators must listen to tracks one by one, making direct comparison difficult and forcing them to rely on auditory memory to predict a track's impact on their video. We present VidTune, a system that facilitates exploration and comparison of generative soundtracks with Visual Thumbnails. We introduce a technique for generating contextual visual thumbnails that translate a music track's character into a stylized preview of the user's own video. Our method maps analyzed musical attributes like valence and energy to visual parameters such as color and brightness applied to a keyframe. This approach transforms soundtrack selection from a slow process of sequential listening into a rapid act of parallel visual comparison, allowing creators to more intuitively imagine each track's final impact on their video.