by Andreas Pettersson
Self-learning Dots and Boxes-player
Abstract
This report is about the reinforcement learning-algorithm Q-Learning.
The purpose of this work is to implement a self-learning dots &
boxes-player which after training will be evaluated against two
pre-programmed players. I have investigated how the training period
affects how good the self-learning player gets by vary how long it will
be exploring all the possible states the game can be in. The results are
presented in graphs which are analyzed throughout the work. The
self-learning player and the Q-Learning-algorithm are
analyzed to find out what it has learned and how it has been taught its
strategies during the training period.
The result I came to was that the self-learning player needs to play
against itself for several hundred thousand of games before it stops to
learn. The self-learning player became in all of the tests better than
mine pre-programmed players – it even became so good that
it beat me the majority of the games I played against it.