New internship position on transformers for modeling time series

New internship position on transformers for modeling time series
By Mathieu Fontaine, Francois Roueff

Our group is hiring an intern on the topic transformers for modeling time series. The research part will be to investigate stochastic positional encoding.

Important informations

Date: Starting in April 2025
Duration: 4 to 6 months
Place of work: Palaiseau (Paris outskirts), France
Supervisors:
Contact: mathieu.fontaine; francois.roueff@telecom-paris.fr

Problem statement and context

A key component of transformer architecture is the positional encoding (@vaswani2017attention), which encodes the order of tokens in sequence data. This encoding can be implemented as either absolute (@vaswani2017attention; @kenton2019bert) or relative (@dai2019transformer; @liutkus2021relative). Absolute positional encoding is effective with fixed-length sequences but tends to struggle with long-range dependencies. In contrast, relative positional encoding methods handle long-range dependencies more effectively by focusing on the relative position, or lag, between tokens, although they generally involve greater computational complexity. An exception is the stochastic positional encoding proposed by @liutkus2021relative, which maintains linear complexity. However, this stochastic positional encoding relies on a Gaussian kernel, which may not be optimal for capturing long-range dependencies, particularly in applications like financial time series.

The first goal of the internship is the in-depth study of @liutkus2021relative where new ways of positional encoding have been proposed for modeling various time series and in particular their possibly long-range dependence structure.

We intend to leverage this approach by better understanding how to relate the positional encoding with statistical properties of the time series at hand: not only the second order dependence but also non-gaussianity, non-linearity, or non-stationarity.

This internship may lead to a CIFRE thesis in collaboration with BNP Parisbas/LTCI to work on the common research axis “Transformers for modeling financial Time Series”.

References

Dai, Zihang. 2019. “Transformer-Xl: Attentive Language Models Beyond a Fixed-Length Context.” arXiv Preprint arXiv:1901.02860.

Kenton, Jacob Devlin Ming-Wei Chang, and Lee Kristina Toutanova. 2019. “Bert: Pre-Training of Deep Bidirectional Transformers for Language Under- standing.” In Proceedings of naacL-HLT, 1:2. Minneapolis, Minnesota.

Liutkus, Antoine, Ondřej Cıfka, Shih-Lun Wu, Umut Simsekli, Yi-Hsuan Yang, and Gael Richard. 2021. “Relative Positional Encoding for Transformers with Linear Complexity.” In International Conference on Machine Learning, 7067–79. PMLR.

Vaswani, A. 2017. “Attention Is All You Need.” Advances in Neural Information Processing Systems.

Candidate profile

Required: Deep learning models, Stochastic processes.
Preferable: transformers, time series.