Advancing Human-Machine Interactive Communication and Creation: Generative AI for Audio and Beyond

Advancing Human-Machine Interactive Communication and Creation: Generative AI for Audio and Beyond
By Xiaoyu Bie

Xiaoyu Bie will give a talk on Advancing Human-Machine Interactive Communication and Creation: Generative AI for Audio and Beyond

Abstract

Interactive human-machine communication and creation have been largely fostered with recent development on generative models and deep learning techniques. Leveraging the data-driven end-to-end training, neural models can efficiently extract and understand information from the real world, generating rich, diverse content from given conditions. Despite this progress, applying generative AI in practice remains challenging. The systems need to effectively capture the variety of information conveyed by humans. while the generative models must be expressive enough to handle intricate data modeling and high-quality content creation. Meanwhile, there is a growing need for interpretable insights into the underlying mechanisms of the generative process to allow for controlled interaction and creation. In this talk, I will present my research on advancing the aforementioned challenges, with a specific focus on audio data. I will share my findings and conclude by discussing future directions in this field.

Biography

Xiaoyu Bie is currently a postdoctoral researcher at Télécom Paris, Institut Polytechnique de Paris, working on generative models and the applications to speech and audio. Previously, he completed his PhD at INRIA and Université Grenoble-Alpes, where he focused on generative models for multimedia data processing. He regularly serves as a reviewer for international conferences and journals such as ICASSP, Interspeech, NeurIPS, CVPR, ICCV, ECCV, ACM MM, IEEE TASLP and IJCV.