A neural voice editing tool for transforming pitch, subharmonics, and structural noise

Frederik Bous, Jacob Morton, Joon Byun, Yechan Yu, Jinhyeok Yang, Hyeongju Kim, Juheon Lee

Primary Subject: Software/Library Demo

Abstract:

We present a voice editing tool based on our neural analysis synthesis (NANSY) framework that allows modification of pitch parameters such as F0, subharmonics, jitter, and shimmer. The NANSY framework splits the voice into pitch parameters, linguistic-, and speaker embeddings. In this work we propose an extension NANSY's pitch encoder to include subharmonic- and structural noise parameters by predicting a joint note and phonation-mode distribution Furthermore, we extend the self-supervised training procedure to include supervision on synthetic on-the-fly generated data and integrate Viterbi smoothing into the training process. We provide a web-based UI that allows editing the voice on a computer or tablet by redrawing the control parameter curves.