Toward Score-Informed Music Audio Editing System using Differentiable Digital Signal Processing Mixture Model

Kengo Takemoto; Tomohiko Nakamura; Hiroshi Saruwatari

Toward Score-Informed Music Audio Editing System using Differentiable Digital Signal Processing Mixture Model

Kengo Takemoto, Tomohiko Nakamura, Hiroshi Saruwatari

Some of the required materials for this paper do not exist: Video

Abstract:

Differentiable digital signal processing (DDSP) incorporates signal processing modules into neural networks to enable gradient-based learning. The DDSP mixture model (DDSP-MM) builds on this idea to represent an audio mixture as the sum of source signals generated by multiple differentiable synthesizers. Each source is independently controlled and may represent a distinct musical line. By estimating control parameters to match the input audio, DDSP-MM allows flexible editing and re-synthesis of individual sources. However, it requires time-aligned score information for initialization, which is often unavailable in practice. In this paper, we address this limitation by applying dynamic time warping to align unaligned score data with the mixture in a preprocessing step. We further provide a music audio editing system to demonstrate the workflow, enabling source-level editing and re-synthesis.