Ranking-based Approaches for AI Challenges based on Modern Data.
By Myrto Limnios
Myrto Limnios will give a talk entitled “Ranking-based Approaches for AI Challenges based on Modern Data”
Abstract
Comparing modern data of unknown generative mechanisms has become essential to many data analysis problems in the context of generative AI. For example, large language models are trained to compare and aggregate multimodal and multisource datasets. Data augmentation algorithms enlarge real-world datasets by synthetically generating a new carefully tailored sample. However, to use AI models in real-world applications, it is important to characterize and detect undesirable (generated) values that lie in low-probability spaces, while controlling their effects on the generalization ability of the models.
In this talk, we will discuss a nonparametric approach for comparing two modern samples, being valued in (non-)Euclidean spaces, using rank statistics. We will elaborate on a distribution-free generalization of rank statistics based on bipartite ranking methods in particular. This new class encompasses the classic univariate framework, namely used for testing statistical homogeneity or independence, while naturally mitigating the effects of undesirable values. We will relate those problems to Receiver Operating Characteristic (ROC) analysis, and provide probabilistic finite-sample uniform guarantees of the testing errors. Convincing experimental studies will illustrate the advantages of this approach compared to state-of-the art methods.
We will end this talk with important open questions for advancing generative models, including for comparing observational data using causal methods. This talk is based on joint works with Stephan Clémençon (Télécom Paris) and Nicolas Vayatis (ENS Paris-Saclay).
Biography
Myrto Limnios is a Bernoulli Instructor at Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland. Her research is primarily driven by the development of new nonparametric learning methods for comparing datasets and understanding causal relations using real-world data, with focus on proving nonasymptotic generalization properties. She is specialized in statistical learning theory, stochastic modeling, hypothesis testing methods and causal reasoning, that are motivated by challenging multidisciplinary applications.
Myrto grew up in France, and got two Masters from Université Paris Diderot (M2MO) and Ecole des Mines Nancy in applied mathematics. She obtained her Ph.D. from Ecole Normale Supérieure (ENS) Paris-Saclay, supervised by Prof. Nicolas Vayatis, and entitled “Rank processes and statistical applications in high dimension”. Before joining EPFL, she was a Postdoctoral Fellow for two years, at Copenhagen University, Department of Mathematics, and worked with Prof. Niels R. Hansen on causal methods for continuous-time high-dimensional data.