by Andreas Pettersson

Self-learning Dots and Boxes-player

Abstract

This report is about the reinforcement learning-algorithm Q-Learning. The purpose of this work is to implement a self-learning dots & boxes-player which after training will be evaluated against two pre-programmed players. I have investigated how the training period affects how good the self-learning player gets by vary how long it will be exploring all the possible states the game can be in. The results are presented in graphs which are analyzed throughout the work. The self-learning player and the Q-Learning-algorithm are analyzed to find out what it has learned and how it has been taught its strategies during the training period.

The result I came to was that the self-learning player needs to play against itself for several hundred thousand of games before it stops to learn. The self-learning player became in all of the tests better than mine pre-programmed players – it even became so good that it beat me the majority of the games I played against it.