Henry Eklind

Hierarchical categorization of transactions with heavy class imbalance and subjective data

Abstract

In this thesis we explore various models for predicting transaction categorization, and if it would be possible to outperform a previous model on the same task. Our approach consists of evaluating multiple feed forward neural networks with character n-grams tokenization; class imbalance adjustments and a new hierarchical loss function. The results shows that feed forward network models in general outperforms both the baseline SVM model, but also the previous model. The class imbalance adjustment techniques generally performs worse in general; auxiliary features did not show any discernibile change in performance and hierarchical loss function does not affect the performance in any significant way. The best performing model, using only transaction descriptions and a categorical cross-entropy loss function, achieves an increase of 0.05 for both macro and micro F1-score, and also an increase of 0.08 and 0.06 PRC AUC Micro and Macro respectively, in comparison to the old model.