Olle Hassel Petter Janse

Q-learning in connect four

Abstract

Q-learning is an algorithm for self-learning where the the learner is rewarded for encouraged behaviour and punished for unwanted behaviour. The report investigates how many matches of connect-four for a self-learned player through Q-learning is needed to win on average 90 % of all matches against a random player, a pattern matchin player and against a calculating player. It will also be investigated wherever the Q-learning player can be combined with other algorithms to create an improved player. Within realistic time on a personal computer the Q-learner beats the pattern matching and calculating player. The random player cannot be beaten in reasonable time, and would also require an unreasonable ammount of memory. The improved Q-learning player that combines Q-learning with pattern matching and calculated start values can be created and this one beats all the three players after a very short learning period.