Nora Al-Naami

Imitation Learning using Reward-Guided DAgger

Abstract

End-to-end autonomous driving can be approached by finding a policy function that maps observation, an example is a driving view, to driving action. This is done by imitating an expert driver. This approach can be conducted by supervised learning, where the policy function is tuned to minimize the difference between the ground truth and predicted driver actions. However, usingthis method leads to poor performance since the policy function is trained only on the states reached by the expert. An algorithm in imitation learning that addresses this problem is DAgger. The main idea of DAgger is to train a policy iteratively with data collected from expert and the policy function itself. This requires identifying a rule for the interaction of the expert and policy function. The current DAgger variants require querying the expert for a long time and do not explore the state space with both safety and efficiency. In this thesis, we present an extension to DAgger, which attempts to present the safety in state space as a probability measure in order to minimize expert queries and guide the exploration in training. We evaluate the proposed algorithm RG-DAgger in a Virtual Battle simulator (VBS) and show that the proposed algorithm trains a better performing policy function while requiring less expert queries compared to other DAgger variants.