Model-Free Reinforcement Learning with Skew-Symmetric Bilinear UtilitiesReport as inadecuate

Model-Free Reinforcement Learning with Skew-Symmetric Bilinear Utilities - Download this document for free, or read online. Document in PDF available to download.

1 DECISION LIP6 - Laboratoire d-Informatique de Paris 6 2 Equipe MAD - Laboratoire GREYC - UMR6072 GREYC - Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen 3 SYSU-CMU Joint Institute of Engineering 4 SYSU-CMU Shunde International Joint Research Institute 5 Cordon Electronics DS2i

Abstract : In reinforcement learning, policies are typically evaluated according to the expectation of cumu-lated rewards. Researchers in decision theory have argued that more sophisticated decision criteria can better model the preferences of a decision maker. In particular, Skew-Symmetric Bilinear SSB utility functions generalize von Neumann and Morgenstern-s expected utility EU theory to encompass rational decision behaviors that EU cannot accommodate. In this paper , we adopt an SSB utility function to compare policies in the reinforcement learning setting. We provide a model-free SSB reinforcement learning algorithm, SSB Q-learning, and prove its convergence towards a policy that is-optimal according to SSB. The proposed algorithm is an adaptation of fictitious play Brown, 1951 combined with techniques from stochastic approximation Borkar, 1997. We also present some experimental results which evaluate our approach in a variety of settings.

Author: Hugo Gilbert - Bruno Zanuttini - Paolo Viappiani - Paul Weng - Esther Nicart -



Related documents