Emil Widham

Scaling up Maximum Entropy Deep Inverse Reinforcement Learning with Transfer Learning

Abstract

In this thesis an issue with common inverse reinforcement learning algorithms is identified, which causes them to be computationally heavy. A solution is proposed which attempts to address this issue and which can be built upon in the future. The complexity of inverse reinforcement algorithms is increased because at each iteration something called a reinforcement learning step is performed to evaluate the result of the previous iteration and guide future learning. This step is slow to perform for problems with large state spaces and where many iterations are required. It has been observed that the problem solved in this step in many cases is very similar to that of the previous iteration. Therefore the solution suggested is to utilize transfer learning to retain some of the learned information and improve speed at subsequent steps. In this thesis different forms of transfers are evaluated for common reinforcement learning algorithms when applied to this problem. Experiments are run using value iteration and Q-learning as the algorithms for the reinforcement learning step. The algorithms are applied to two route planning problems and finds that in both cases a transfer can be useful for improving calculation times. For value iteration the transfer is easy to understand and implement and shows large improvements in speed compared to the basic method. For Q-learning the implementation contains more variables and while it shows an improvement it is not as dramatic as that for value iteration. The conclusion drawn is that for inverse reinforcement learning implementations using value iteration a transfer is always recommended while for implementations using other algorithms for the reinforcement learning step a transfer is most likely recommended but more experimentation needs to be conducted.