School of
Electrical Engineering
and Computer Science

GSLT course in Clustering (Level 2)

Fall semester 2008

This course is intended for students in language technology who want to know how text clustering works, what it can be used to and how to use clustering tools in language technology applications. There is a description of this course here.

There is an evaluation of this course for the fall semester 2008 here.
There is an analysis (in Swedish) of this course for the fall semester 2008 here.

Schedule and Content

  • Thursday September 11, 2008. 10.00-17.00 in Göteborg. Lecture notes.
  • Monday October 20, 2008. 10.00-17.00 in Göteborg. Lecture notes.
  • Wednesday December 10, 2008. Afternoon. Closing seminar at KTH, Stockholm. Talks.
Preliminary content of the lectures.

Course Requirements

There are four course requirements: In the two laborations you will use a clustering tool called Infomat, devoloped by us. It is written in Java and has a GUI. In order to do your work you'll need to have access to a computer with Java (SE 6). As the GUI is rather complex you'll need a screen with fairly high resolution to run it properly and you'd probably benefit from having a Windows computer, since they often have better graphic engines. The best solution is for you to bring your own laptop to the meetings, however there will be computers available during the scheduled laboration times.

You may do the laborations in groups of two, but the individual project is individual.

Individual Project

At the third meeting you will present the results of your individual project. The project should typically consist of a practical experiment on any aspect of clustering of some linguistic data and be described in a course report. The course focus on clustering of texts, but most methods are applicable to other objects as well.

An idea is to try to submit your work to the Workshop on Unsupervised Methods at SLTC 2008 (The Swedish Language Technology Conference).


Here you can download the latest (inofficial) Infomat version. If there is a link here it is newer than the one on the Infomat hompage. These versions only contain small bug fixes.
  • No newer versions than the one on the Infomat homepage!


A list of some clustering resources.


^ Up to GSLT's courses.

Published by: Magnus Rosell <>
Updated 2009-08-31