TBD
By Paul Caillon
Paul Caillon will give a talk on efficient optimisation of …
Abstract
In the past decades, deep learning (DL) has emerged as a prominent tool to address a wide range of problems and tasks in computer vision (CV), natural language processing (NLP), reinforcement learning and more. Leveraging the backpropagation (BP) algorithm, DL has been able to train deep neural networks with millions of parameters on large datasets, leading to significant improvements in performance in various domains. Yet, the increase in the amount of resources needed to train and deploy state of the art deep neural models raises serious concerns about the environmental footprint of DL, as well as about the potential AI oligopoly that could arise from the prohibitive costs of training large models in flagship domains such as speech recognition, CV and NLP. A dramatic turn was taken in 2017 with the introduction of the Transformer architecture, which is used to build billions of parameters models trained on very large amount of data such as PALM, GPT-4, etc. Along with the rise of large models, it is becoming increasingly important to develop efficient and frugal learning algorithms and architectures that can achieve high performance but with a reduced computational cost. Through a theoretical analysis of today’s DL models, we aim to identify some of the key principles that can be used to design more efficient models and algorithms that can achieve high performance with a reduced computational cost. In this talk, we will discuss two main directions to address the computational cost of DL models. We will first analyze some of the proposed alternatives to BP, namely the Direct Feedback Alignement (DFA) algorithm and the Forward-Only paradigm and propose improvements to bridge the performance gap with BP. Second, we will exhibit a theoretical constraint of the Transformer architecture and propose a simple yet efficient enhancement to the standard attention mechanism to address this issue.
Biography
Paul Caillon is a postdoctoral fellow under the supervision of Alexandre Allauzen at Université Paris Dauphine - PSL. He obtained his Ph.D. at Université de Lorraine under the supervision of Christophe Cerisara, where he worked on dynamic architectures for deep neural networks and weakly-supervised deep learning for natural language processing. His current work focuses on frugality in AI, which refers to the development of AI systems that are resource-efficient and less environmentally-demanding. He is currently involved in the PEPR Sharp project, working more specifically on frugal learning principles.