Camilla Ahlenius

Automatic Pronoun Resolution for Swedish

This report describes a quantitative analysis performed to compare two different methods on the task of pronoun resolution for Swedish. The first method, a reimplementation of Mitkov's algorithm, is a heuristic-based method - meaning that the resolution is determined by a number of manually engineered rules regarding both syntactic and semantic information. The second method is data-driven - a Support Vector Machine (SVM) using dependency trees and word embeddings as features. Both methods are evaluated on an annotated corpus of Swedish news articles which was created as a part of this thesis.
SVM-based methods significantly outperformed the reimplementation of Mitkov's algorithm. The best performing SVM model relies on tree kernels applied to dependency trees. The model achieved an F1-score of 0.76 for the positive class and 0.9 for the negative class, where positives are pairs of pronoun and noun phrase that corefer, and negatives are pairs that do not corefer.