Improving oscillator frequency estimation using differentiable DSP and spectral optimal transport

Improving oscillator frequency estimation using differentiable DSP and spectral optimal transport
By Bernando Torres

Bernando Torres will give a talk on Improving oscillator frequency estimation using differentiable DSP and spectral optimal transport

Abstract

In neural audio signal processing, incorporating pitch information is essential for training synthesizers. However, jointly training pitch estimators and synthesizers using standard audio-to-audio reconstruction losses remains a significant challenge, often resulting in the use of external pitch trackers.

In this seminar, I will delve into our recent work, where we propose a novel loss function in the spectral domain to train neural frequency estimators without supervision. Inspired by optimal transportation theory, this loss function minimizes the displacement of spectral energy, providing more effective gradients for frequency estimation of oscillators compared to standard L1/L2 losses.

I will begin with an overview of differentiable digital signal processing (differentiable DSP), which involves using neural networks to estimate the parameters of signal processors, such as synthesizers. Following this, I will introduce our method for comparing audio signals by employing the closed-form solution to the Wasserstein distance in the one-dimensional case as a training objective. Finally, I will discuss our experiments on synthetic data and address the limitations and future perspectives of this work.

Biography

Bernardo Torres is a PhD student at Telecom Paris, working under the supervision of Prof. Gaël Richard and Prof. Geoffroy Peeters, as part of the ADASP group and the HI-Audio project.

Previously, he completed a research internship at Sony CSL Paris during his Master’s program.

He holds a Bachelor’s degree in Electrical Engineering with a focus on Computer Engineering from Universidade Federal de Minas Gerais in Brazil, and an engineering degree from Telecom Paris in France. In 2022, he also earned a Master’s degree from the MVA program in Applied Mathematics, Machine Learning, and Artificial Intelligence at École Normale Supérieure Paris-Saclay.

His current research focuses on topics such as music source separation using analysis-by-synthesis, signal processing-informed deep learning, and differentiable digital signal processing. Bernardo’s broader research interests span audio signal processing, machine learning, self-supervised learning, music information retrieval, timbre, optimal transport, and generative models.