Erik Björck

Using embeddings to find similarities between hierarchically related Thema Subject Categories

Abstract

Recommender systems are important models used to keep users interested in consuming more items of an application. To implement recommender systems, similarities between items and users need to be found. This can be done, for instance, by capturing user preferences through ratings or by using relevant features of the items. To capture these similarities, mathematical objects known as embeddings can be used to map the relationships between categorical data into real-valued vectors. The similarity between the data representations can then be measured by how close the vectors are in embedding space. In this thesis, embeddings have been used to find similarities between hierarchically related Thema Subject Categories (Thema codes), which are short alphanumeric sequences commonly used to categorize books. More specifically, the graph embedding approach known as DeepWalk was applied to three different models to learn similarities between Thema codes. The data consisted of pairs of Thema codes gathered from books in the Swedish online book application Storytel. By constructing graphs from Thema codes and their pairwise occurrences, high dimensional similarities between Thema codes could be learned. To evaluate the models, three different offline evaluation methods, and one online evaluation method was used. In the online evaluation, it was shown through one week of A/B testing that click-through rate increased in two recommendation lists in the Storytel application when the embeddings were used for Thema code similarities between books. The results showed that it is possible to use DeepWalk to learn embeddings of Thema codes for the task of recommendation. Valuable future research could thus include investigating more advanced embedding approaches of Thema codes.