Johan Litsfeldt

E-mail classification

A study of supervised classification methods.

Abstract

This report discusses methods for automatic classification of email i.e. categorization of mails with respect to their content. Methods used for classification are described for an arbitrary number of categories but especially for the binary case. The algorithms are also analyzed through implementation and evaluation of a spam filter based on these methods. In addition to classification algorithms, the report also includes language analysis of e-mails, weighting principles and a review of modern spam techniques.