Changing the Random Behavior of a Q-Learning Agent Over Time

Peter Boström & Anna Maria Modée

Abstract

Q-learning is a Reinforcement learning technique where an AI agent learns from experiences. This technique is commonly used together with the so-called epsilon-greedy policy. The goal of this thesis was to determine how a few different random-behavior policies could affect learning rate of the Q-learning agent. To test this our agent played a reduced instance of the board game Blokus on a 5 by 5 board, primarily against a random- playing opponent. During these tests two different policies were tested that both started with the agent preferring random moves and gradually moving over to trusting its previous experiences. Both new policies introduced were able to converge to a close to 100% win rate. The study showed to be inconclusive however as the game instance was very limited and with our implementation the agent was able to beat it without any random behavior at all. Their similar performance to a relatively non-exploring strategy along with theoretical motivation for their use indicate that further research on them is motivated.