From Theory to Intervention: Mechanistic Tools for Understanding Foundation Models

From Theory to Intervention: Mechanistic Tools for Understanding Foundation Models
By Quentin Bouniot

Quentin Bouniot will give a talk entitled From Theory to Intervention: Mechanistic Tools for Understanding Foundation Models

Abstract

Foundation models (FMs), large-scale deep neural networks (e.g. GPT, LLaMA or LLaVA) trained on massive datasets, have achieved remarkable generative capabilities, yet their internal mechanisms often remain opaque. In this talk, we present three advances toward a mechanistic understanding of FMs at multiple scale, enabling prediction, interpretation, and control. First, we introduce the non-linearity signature, a quantification of the intrinsic non-linearity of deep neural architectures derived from affine optimal transport theory. Unlike traditional measures of expressive power like depth or width, the non-linearity signature explains behaviors across models by capturing how they transform input distributions. Next, we address the challenge of interpreting unsupervised concept representations in large-scale vision models. Here, we propose a method to map learned concept activations into the latent space of pretrained generative models, enabling high-fidelity visualization and interactive exploration. Finally, we extend sparse autoencoders (SAEs) to vision-language models (VLMs). We show that SAEs significantly improve the monosemanticity of vision representations, and demonstrate their application as a scalable, unsupervised tool for mechanistic control in VLMs.

Biography

Quentin Bouniot is a Postdoctoral Researcher in the Explainable Machine Learning group at TUM & Helmholtz Munich, where he studies mechanistic interpretability and representational alignment for vision-language models. Previously, he was a postdoc at Telecom Paris (LTCI) working on uncertainty and interpretability in deep learning. He earned his PhD from CEA-List and Université Jean-Monnet Saint-Étienne, on the topic of few-shot learning and meta-learning for computer vision. His research bridges representation learning and mechanistic interpretability, using foundation models as a case study. Originally from Tahiti, French Polynesia, he advocates for safe, responsible, and sustainable AI applications.