Abstract: This paper introduces a new approach to solve sensor management problems.Classically sensor management problems can be well formalized asPartially-Observed Markov Decision Processes POMPD. The original approachdevelopped here consists in deriving the optimal parameterized policy based ona stochastic gradient estimation. We assume in this work that it is possible tolearn the optimal policy off-line in simulation using models of theenvironement and of the sensors. The learned policy can then be used tomanage the sensors. In order to approximate the gradient in a stochasticcontext, we introduce a new method to approximate the gradient, based onInfinitesimal Perturbation Approximation IPA. The effectiveness of thisgeneral framework is illustrated by the managing of an Electronically ScannedArray Radar. First simulations results are finally proposed.

Author: Thomas Bréhard INRIA Futurs, Emmanuel Duflos INRIA Futurs, LAGIS, Philippe Vanheeghe LAGIS, Pierre-Arnaud Coquelin INRIA Futurs

Source: https://arxiv.org/


