I am a researcher at KTH Royal Institute of Technology in Stockholm, working on reinforcement learning and information theory. I completed my PhD in December 2025 under the supervision of Mikael Skoglund and Tobias Oechtering, supported by the WASP Graduate School. My thesis is titled An Information-Theoretic Approach to Bandits and Reinforcement Learning.
My research focuses on decision problems under uncertainty and establishing performance guarantees for learning algorithms — studying how efficiently agents acquire and exploit information in bandit and RL settings. I am particularly interested in Thompson Sampling, regret analysis, and information-directed exploration. My work combines theoretical analysis with empirical experimentation and has been presented at ICML, ISIT, NeurIPS, and published in TMLR.
In 2024, I spent four months as a Visiting Student Researcher at Stanford University working with Professor Benjamin Van Roy. In Spring 2025, I did a research internship at Lynx Asset Management, developing reinforcement-learning methods for optimal trade execution.
I welcome discussions with anyone interested in reinforcement learning, information theory, or sequential decision-making. Feel free to reach out!
Publications
Amaury Gouverneur, Tobias J. Oechtering, and Mikael Skoglund,
An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandit Problems
TMLR 2026 |
pdf
Resolves a conjecture on logistic bandits by removing exponential dependence on the logistic slope.
Amaury Gouverneur, Tobias J. Oechtering, and Mikael Skoglund,
Refined PAC-Bayes Bounds for Offline Bandits
ISIT 2025 |
pdf
Derives state-of-the-art PAC-Bayes bounds for offline bandits via a new optimization technique.
Raghav Bongole, Amaury Gouverneur, Tobias J. Oechtering, and Mikael Skoglund,
Information-Theoretic Minimax Regret Bounds for Reinforcement Learning Problems
Submitted to ITW 2025 |
pdf
Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, and Mikael Skoglund,
An Information-Theoretic Analysis of Thompson Sampling with Infinite Action Spaces
ICASSP 2025 |
pdf
Extends information-theoretic regret analysis of Thompson Sampling to continuous action spaces.
Raghav Bongole, Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, and Mikael Skoglund,
Information-Theoretic Minimax Regret Bounds for Reinforcement Learning based on Duality
ICASSP 2025 |
pdf
Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, and Mikael Skoglund,
Chained Information-Theoretic Bounds and Tight Regret Rate for Linear Bandit Problems
ICML 2024 (FoRLaC Workshop) |
arXiv |
pdf
Achieves the optimal regret rate for linear bandits using a chaining-based analysis.
Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, and Mikael Skoglund,
Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian Rewards
ISIT 2023 |
arXiv |
pdf |
conference pdf
Extends the Russo–Van Roy information-theoretic framework to contextual bandit problems.
Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, and Mikael Skoglund,
An Information-Theoretic Analysis of Bayesian Reinforcement Learning
Allerton 2022 |
arXiv |
pdf |
conference pdf
Antoine Aspeel, Amaury Gouverneur, Raphaël M. Jungers, and Benoit Macq,
Optimal Intermittent Particle Filter
IEEE Transactions on Signal Processing 2022 |
arXiv |
pdf |
journal pdf
Amaury Gouverneur,
Optimal Measurement Times for Particle Filtering and its Application in Mobile Tumor Tracking
Master Thesis 2022, supervised by Benoit Macq |
dial |
pdf
Antoine Aspeel, Amaury Gouverneur, Raphaël M. Jungers, and Benoit Macq,
Optimal Measurement Budget Allocation for Particle Filtering
ICIP 2020 |
arXiv |
pdf |
conference pdf
Teaching
Project in Multimedia Processing and Analysis, EQ2445 at KTH — 2024
Machine Learning and Data Science, EQ2415 at KTH — 2024
Pattern Recognition and Machine Learning, EQ2341 at KTH — 2020–2024
Deep Neural Networks, EP232U at KTH — Spring 2022
Service
Reviewer for ICML, ICLR, ISIT, ICASSP, and EUSIPCO.
WASP Cluster leader for Mathematical Foundations of AI other than ML (2020–2024) and Sequential Decision-Making and Reinforcement Learning (current).
Supervision
Bachelor theses
- Reza Qorbani, Kevin Pettersson — Investigation of Information-Theoretic Bounds on Generalization Error
- Edwin Östlund, Aron Malmborg — Fine-Tuning a GPT Model for Human-Like Chess Playing [demo]
- Tim Persson, Markus Palmheden — Researching GPT Model for Human-Like Chess Playing
Master theses
- Zhen Tian: Anomaly Detection in Application Logs
- Guangze Shi: Privacy Leaks from Deep Linear Networks
- Daniel Pérez: Improving Recommender Engines for Video Streaming Platforms with RNNs