Combining policies: the best of human expertise and neurocontrolReport as inadecuate




Combining policies: the best of human expertise and neurocontrol - Download this document for free, or read online. Document in PDF available to download.

1 LRI - Laboratoire de Recherche en Informatique 2 TAO - Machine Learning and Optimisation LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623

Abstract : We consider sequential decision making in the case where a generative model and a parametric policy are available. Such a framework is naturally tackled with Direct Policy Search, i.e. parametric op-timisation over simulations. We propose a simple method that combines this parametric policy with a more generic neural network, where all parameters are trained simultaneously. As such, our approach doesn-t require any computational overhead. We show that the resulting policy significantly outperforms both the domain specific policies and the neural network on a unit commitment test problem.





Author: Vincent Berthier - Adrien Couëtoux - Olivier Teytaud -

Source: https://hal.archives-ouvertes.fr/



DOWNLOAD PDF




Related documents