Abstract

This paper will investigate the implementation of the Q-learning reinforcement algorithm on an impartial, combinatorial game known as Nim. In our analysis of impartial games and Nim, an already established optimal strategy for playing Nim will be presented. This strategy will then be used as an exact benchmark for the evaluation of the learning process.

It is shown that the Q-learning algorithm does indeed converge to the optimal strategy under certain assumptions. A parameter analysis of the algorithm is also undertaken and finally the implications of the results are discussed. It is asserted that it is highly likely that the Q-learning algorithm can be effective in learning the optimal strategy for any impartial game.