Perceptual Metric Learning for Speech Quality Assessment
By Alessandro Ragano

Alessandro Ragano will give a talk on Perceptual Metric Learning for Speech Quality Assessment

Abstract

Building artificial intelligence systems that evaluate the perceived quality of speech and audio applications is a challenging task. These systems depend on a vast amount of labelled data, which is obtained through expensive and time-consuming listening tests. As research in generative speech processing, including codecs, enhancement, and synthesis, advances rapidly, collecting training labels becomes increasingly impractical. This seminar presents new methods based on perceptual metric learning with the goal of creating speech quality models that reduce domain mismatch and the need for listening tests for training. We will explore the principles of perceptual metric learning, investigate its application as a loss function for speech enhancement, and address the complexities associated with representation learning in regression tasks.

Biography

Alessandro Ragano is a Postdoctoral Research Fellow in the School of Computer Science at University College Dublin. He holds a PhD in Computer Science from University College Dublin (2022), and an MSc in Computer Science and Engineering from the Polytechnic of Milan (2018). During his PhD, he was a visiting student at Queen Mary University of London and an enrichment student at the Alan Turing Institute. His research focuses on developing deep learning models to estimate the quality of experience of music and speech applications. He is also interested in performance evaluation of media quality models, music representation models, synthetic speech, self-supervised speech models, and immersive media. His current project focuses on representation learning for robust speech quality models.