A comparative study of Transfer Learning for Drum audio Style Classification

Trent Eriksen, Edwin VanderHeide, Robert Saunders

Primary Subject: Early Research

Abstract:

Research in drum classification often trends in two key directions; (1) audio based single instrument classification, or (2) automatic drum transcription which makes exploration of drum audio style compelling, especially in comparison to a CNN trained on the GMD (Groove Midi Dataset) with a pretrained transformer based model PaSST (Patchout Audio Spectrogram Transformer) with frozen general audio embeddings from AudioSet. This comparison reveals the ways in which general audio knowledge can affect drum audio style classification. Experiments with model depth, augmentation and padding show that PaSST with these frozen embeddings reduces performance in terms of accuracy but reveals robust feature representation distinct from the CNN.