Amaury Gouverneur

I completed my PhD in December 2025 at KTH Royal Institute of Technology under the supervision of Mikael Skoglund and Tobias Oechtering, supported by the WASP Graduate School. My thesis is titled An Information-Theoretic Approach to Bandits and Reinforcement Learning [explainer].

My research focuses on decision problems under uncertainty and establishing performance guarantees for learning algorithms, studying how efficiently agents acquire and exploit information in bandit and RL settings. I am particularly interested in Thompson Sampling, regret analysis, and information-directed exploration. My work combines theoretical analysis with empirical experimentation and has been presented at ICML, ISIT, NeurIPS, and published in TMLR.

In 2024, I was a Visiting Student Researcher at Stanford University working with Professor Benjamin Van Roy [note]. In 2025, I did a research internship at Lynx Asset Management, developing reinforcement-learning methods for optimal trade execution.

I welcome discussions with anyone interested in reinforcement learning, information theory, or sequential decision-making. Feel free to reach out!

Publications

Amaury Gouverneur, Tobias J. Oechtering, and Mikael Skoglund,
An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandit Problems
TMLR 2026 | pdf | note
Resolves a conjecture on logistic bandits by removing exponential dependence on the logistic slope.

Amaury Gouverneur, Tobias J. Oechtering, and Mikael Skoglund,
Refined PAC-Bayes Bounds for Offline Bandits
ISIT 2025 | pdf
Derives state-of-the-art PAC-Bayes bounds for offline bandits via a new optimization technique.

Raghav Bongole, Amaury Gouverneur, Tobias J. Oechtering, and Mikael Skoglund,
Information-Theoretic Minimax Regret Bounds for Reinforcement Learning Problems
Submitted to ITW 2025 | pdf

Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, and Mikael Skoglund,
An Information-Theoretic Analysis of Thompson Sampling with Infinite Action Spaces
ICASSP 2025 | pdf
Extends information-theoretic regret analysis of Thompson Sampling to continuous action spaces.

Raghav Bongole, Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, and Mikael Skoglund,
Information-Theoretic Minimax Regret Bounds for Reinforcement Learning based on Duality
ICASSP 2025 | pdf

Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, and Mikael Skoglund,
Chained Information-Theoretic Bounds and Tight Regret Rate for Linear Bandit Problems
ICML 2024 (FoRLaC Workshop) | arXiv | pdf
Achieves the optimal regret rate for linear bandits using a chaining-based analysis.

Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, and Mikael Skoglund,
Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian Rewards
ISIT 2023 | arXiv | pdf | conference pdf
Extends the Russo–Van Roy information-theoretic framework to contextual bandit problems.

Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, and Mikael Skoglund,
An Information-Theoretic Analysis of Bayesian Reinforcement Learning
Allerton 2022 | arXiv | pdf | conference pdf

Antoine Aspeel, Amaury Gouverneur, Raphaël M. Jungers, and Benoit Macq,
Optimal Intermittent Particle Filter
IEEE Transactions on Signal Processing 2022 | arXiv | pdf | journal pdf

Amaury Gouverneur,
Optimal Measurement Times for Particle Filtering and its Application in Mobile Tumor Tracking
Master Thesis 2022, supervised by Benoit Macq | dial | pdf

Antoine Aspeel, Amaury Gouverneur, Raphaël M. Jungers, and Benoit Macq,
Optimal Measurement Budget Allocation for Particle Filtering
ICIP 2020 | arXiv | pdf | conference pdf

Selected Notes

A small selection of notes and longer write-ups that give more context than a publication list. See all notes.

Teaching

Project in Multimedia Processing and Analysis, EQ2445 at KTH, 2024

Machine Learning and Data Science, EQ2415 at KTH, 2024

Pattern Recognition and Machine Learning, EQ2341 at KTH, 2020–2024

Deep Neural Networks, EP232U at KTH, 2022

Service

Reviewer for ICML, ICLR, ISIT, ICASSP, and EUSIPCO.

WASP Cluster leader for the Mathematical Foundations of AI other than ML cluster (2020–2024) and the Sequential Decision-Making and Reinforcement Learning cluster (2025).

Supervision

Bachelor theses

Reza Qorbani, Kevin Pettersson: Investigation of Information-Theoretic Bounds on Generalization Error
Edwin Östlund, Aron Malmborg: Fine-Tuning a GPT Model for Human-Like Chess Playing [demo | note]
Tim Persson, Markus Palmheden: Researching GPT Model for Human-Like Chess Playing

Master theses

Zhen Tian: Anomaly Detection in Application Logs
Guangze Shi: Privacy Leaks from Deep Linear Networks
Daniel Pérez: Improving Recommender Engines for Video Streaming Platforms with RNNs

Publications

Selected Notes

The Logistic Bandit Problem

An Information-Theoretic Lens on Bandits and RL

Chess-GPT: Human-Like Chess with a Fine-Tuned Language Model

Teaching

Service

Supervision