Real-Time Drum-to-Vocal Percussion Sound Conversion System

Rinka Nobukawa, Tomohiko Nakamura, Shinnosuke Takamichi, Hiroshi Saruwatari

Primary Subject: Software/Library Demo

Some of the required materials for this paper do not exist: Video

Abstract:

Vocal percussion (VP) is a vocal technique that emulates drum sounds and plays a crucial rhythmic role in contemporary a cappella music. Despite its importance, synthesizing VP sounds remains difficult due to their noisy, non-linguistic characteristics, which conventional speech and singing voice synthesis methods fail to handle effectively. Our previous work framed VP sound synthesis as a timbre transfer task from drum to VP sounds, leveraging their functional correspondence. It also introduced an offline method using a variational autoencoder-based model called RAVE. In this paper, we propose a real-time drum-to-VP sound conversion system based on this offline method. The system processes input audio in chunks of 46 ms, enabling online operation. We demonstrate that the proposed system operates in real time on the central processing unit of a modern laptop computer.