I completed my PhD in December 2025 at KTH Royal Institute of Technology under the supervision of Mikael Skoglund and Tobias Oechtering, supported by the WASP Graduate School. My thesis is titled An Information-Theoretic Approach to Bandits and Reinforcement Learning [explainer].
My research focuses on decision problems under uncertainty and establishing performance guarantees for learning algorithms, studying how efficiently agents acquire and exploit information in bandit and RL settings. I am particularly interested in Thompson Sampling, regret analysis, and information-directed exploration. My work combines theoretical analysis with empirical experimentation and has been presented at ICML, ISIT, NeurIPS, and published in TMLR.
In 2024, I was a Visiting Student Researcher at Stanford University working with Professor Benjamin Van Roy [note]. In 2025, I did a research internship at Lynx Asset Management, developing reinforcement-learning methods for optimal trade execution.
I welcome discussions with anyone interested in reinforcement learning, information theory, or sequential decision-making. Feel free to reach out!
Publications
Amaury Gouverneur, Tobias J. Oechtering, and Mikael Skoglund,
An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandit Problems
TMLR 2026 |
pdf |
note
Resolves a conjecture on logistic bandits by removing exponential dependence on the logistic slope.
Amaury Gouverneur, Tobias J. Oechtering, and Mikael Skoglund,
Refined PAC-Bayes Bounds for Offline Bandits
ISIT 2025 |
pdf
Derives state-of-the-art PAC-Bayes bounds for offline bandits via a new optimization technique.
Raghav Bongole, Amaury Gouverneur, Tobias J. Oechtering, and Mikael Skoglund,
Information-Theoretic Minimax Regret Bounds for Reinforcement Learning Problems
Submitted to ITW 2025 |
pdf
Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, and Mikael Skoglund,
An Information-Theoretic Analysis of Thompson Sampling with Infinite Action Spaces
ICASSP 2025 |
pdf
Extends information-theoretic regret analysis of Thompson Sampling to continuous action spaces.
Raghav Bongole, Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, and Mikael Skoglund,
Information-Theoretic Minimax Regret Bounds for Reinforcement Learning based on Duality
ICASSP 2025 |
pdf
Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, and Mikael Skoglund,
Chained Information-Theoretic Bounds and Tight Regret Rate for Linear Bandit Problems
ICML 2024 (FoRLaC Workshop) |
arXiv |
pdf
Achieves the optimal regret rate for linear bandits using a chaining-based analysis.
Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, and Mikael Skoglund,
Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian Rewards
ISIT 2023 |
arXiv |
pdf |
conference pdf
Extends the Russo–Van Roy information-theoretic framework to contextual bandit problems.
Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, and Mikael Skoglund,
An Information-Theoretic Analysis of Bayesian Reinforcement Learning
Allerton 2022 |
arXiv |
pdf |
conference pdf
Antoine Aspeel, Amaury Gouverneur, Raphaël M. Jungers, and Benoit Macq,
Optimal Intermittent Particle Filter
IEEE Transactions on Signal Processing 2022 |
arXiv |
pdf |
journal pdf
Amaury Gouverneur,
Optimal Measurement Times for Particle Filtering and its Application in Mobile Tumor Tracking
Master Thesis 2022, supervised by Benoit Macq |
dial |
pdf
Antoine Aspeel, Amaury Gouverneur, Raphaël M. Jungers, and Benoit Macq,
Optimal Measurement Budget Allocation for Particle Filtering
ICIP 2020 |
arXiv |
pdf |
conference pdf
Selected Notes
A small selection of notes and longer write-ups that give more context than a publication list. See all notes.
The Logistic Bandit Problem
Why logistic bandits were not well understood theoretically, and how our recent results help explain the empirical performance of learning algorithms in this setting.
An Information-Theoretic Lens on Bandits and RL
A non-technical introduction to the information-theoretic framework behind my thesis: bandits, regret, Thompson Sampling, and the information ratio.
Chess-GPT: Human-Like Chess with a Fine-Tuned Language Model
How a transformer trained on human chess games emergently learns the rules, strategy, and its own distinct playing style.
Teaching
Project in Multimedia Processing and Analysis, EQ2445 at KTH, 2024
Machine Learning and Data Science, EQ2415 at KTH, 2024
Pattern Recognition and Machine Learning, EQ2341 at KTH, 2020–2024
Deep Neural Networks, EP232U at KTH, 2022
Service
Reviewer for ICML, ICLR, ISIT, ICASSP, and EUSIPCO.
WASP Cluster leader for the Mathematical Foundations of AI other than ML cluster (2020–2024) and the Sequential Decision-Making and Reinforcement Learning cluster (2025).
Supervision
Bachelor theses
- Reza Qorbani, Kevin Pettersson: Investigation of Information-Theoretic Bounds on Generalization Error
- Edwin Östlund, Aron Malmborg: Fine-Tuning a GPT Model for Human-Like Chess Playing [demo | note]
- Tim Persson, Markus Palmheden: Researching GPT Model for Human-Like Chess Playing
Master theses
- Zhen Tian: Anomaly Detection in Application Logs
- Guangze Shi: Privacy Leaks from Deep Linear Networks
- Daniel Pérez: Improving Recommender Engines for Video Streaming Platforms with RNNs