Evaluation of Sentence Representations in Semantic Text Similarity Tasks

This thesis explores the methods of representing sentence representations for semantic text similarity using word embeddings and benchmarks them against sentence based evaluation test sets. Two methods were used to evaluate the representations: STS Benchmark and STS Benchmark converted to a binary similarity task. Results showed that preprocessing of the word vectors could significantly boost performance in both tasks and conclude that word embeddings still provide an acceptable solution for specific applications. Questions also arise if the dataset used is truly ideal for this type of evaluation.