Pedro A. Santos

Pedro A. Santos, Instituto Superior Técnico and INESC-ID
Two-time scale stochastic approximation for reinforcement learning with linear function approximation

In this presentation, I will introduce some traditional Reinforcement Learning problems and algorithms, and analyze how some problems can be avoided and convergence results obtained using a two-time scale variation of the usual stochastic approximation approach.

This variation was inspired by the practical successes of Deep Q-Learning in attaining superhuman performance at some classical Atari games by Deepmind's research team in 2015. Machine Learning practical successes like this often have no corresponding explaining theory. The work that will be presented intends to contribute to that goal.

Joint work with Diogo Carvalho and Francisco Melo from INESC-ID.

Additional file

document preview

Santos PA slides.pdf