Tools for Downloading

This page contains links to some of the tools created in the various projects within the Human Language Technology Group at KTH CSC. All tools distributed with sourcecode are distributed under the GNU General Public Licence. More information on the GNU project is available here. The code comes with no warranty. Use at own risk.

Grim: An Interactive Environment with Focus on Swedish

Grim is an interactive language learning environment with focus on learners of Swedish. It is primarily useful for writing, revising and experimenting with texts, but it could also assist in reading. Grim has its own homepage here.

Stava: A Spell Checker and Morphological Analyzer for Swedish

Stava is a fast and robust spell checker and spelling corrector for Swedish. It can also analyze compounds, lemmatize and POS tag words without context. Stava has no explicit word list, instead it uses a probabilistic method based on a Bloom filter. For more information, we refer to the articles at the bottom of the Stava manual page. Note: All documentation is in Swedish.

The sourcecode for Stava is available here (last published July 25, 2016). Lexicon Bloom filters and some useful lists of common names etc. are available here.

If you have any questions, please contact Viggo Kann (

JavaSDM: A Java Package for Random Indexing

JavaSDM is a package for Random Indexing, a distributional model of lexical semantics, written in Java. It also contains an implementation of the Porter stemming algorithm and classes for handling connections with our Granska, Decompounder and StatServ servers.

The sourcecode for JavaSDM is available here and on-line documentation is available as JavaDoc. Additional information on how to run the random indexing package is available in a Readme file. Furthermore a Readmemore file explains how to use several of the classes in the JavaSDM package stand-alone; for example for decompounding, lemmatizing, tagging or for looking up term and/or document frequencies calculated over large Swedish corpora. A Java package containing a large number of Vector/Matrix similarity measures is available here, together with on-line documentation, also in JavaDoc.

If you have any questions, please contact Martin Hassel (

Stomp: A Part-of-Speech Tagger with a Different View

Stomp is a relatively language independent tagger which does not rely on n-grams of tags. Instead it tags a word by matching the longest sequence of words in training data and assigns the tag in the training data to the word. The different perspective of Stomp makes it useful in an ensemble of taggers.

The sourcecode for Stomp is available here (last published November 22, 2005). For more information, see this article. Note: In this implementation the handling of numerical values is very bad (all numerical values are expected by Stomp to have been replaced with the string "4711"). It should be quite straightforward to improve this handling, though.

If you have any questions, please contact Jonas Sjöbergh (

Compound Splitter:

Compound Splitter is a tool for splitting compound words in Swedish.

The sourcecode for a server implementation of Compound Splitter is available here (Last published March 11, 2009). For more information, see this article.

If you have any questions, please contact Jonas Sjöbergh (

Granska Tagger: A Part-of-Speech Tagger for Swedish

Granska Tagger is an efficient Hidden Markov Model part-of-speech tagger for Swedish. This is the same tagger used internally by the grammar checker Granska. It has a compound word analysis component for use on unknown words. It can also produce lemma information for the words that are tagged.

The sourcecode for Granska Tagger is available here (last published March 10, 2009) and a version for amd64 (last published November 5, 2010, thanks to Robert Östling). For more information, see the following article. To be useful Granska Tagger requires various lexicon files which are available here.

Granska: A grammar checker for Swedish

Granska is an efficient rule based grammar checker for Swedish, see the Granska web page.

The sourcecode for Granska is available in the Git repository under the branch willes. It can be compiled both under Solaris and Linux.

If you have any questions, please contact Viggo Kann (

Inflector: A Simple Word Inflector for Swedish

Inflector performs inflection of Swedish words. It needs Granska Tagger to compile properly. This version is very simple and works only in interactive mode. It should, however, be quite straightforward to adapt it to your own needs.

The sourcecode for Inflector is available here (Last published December 30, 2005). To be useful Inflector requires the same lexicon files as Granska Tagger.

Unfortunately we will not be able to answer any questions regarding this tool.

AutoEval and Missplel: Two Generic Tools for Automatic Evaluation

AutoEval is a tool that greatly simplifies the construction of (NLP system) evaluations. Missplel is a tool that introduces human-like spelling errors into text. For a discussion, we refer to an article on AutoEval and Missplel.

The source code for AutoEval and Missplel (last published November 22, 2005). The code may not be used for commercial purposes. The source code is best compiled with gcc/g++ 3.4.4 and requires Xerces, Boost and zlib. There is also a graphical user interface available that requires QT. Also, you should take a look at the lexicon files (e.g. the file cwtl) used by Granska Tagger if you don't have access to your own tagged corpus (this will only be useful for producing spelling errors in Swedish, however).

If you have any questions, please contact Johnny Bigert (

Infomat - A Vector Space Exploration Tool

Infomat is available here. Infomat is a Vector Space Visualization Tool. With it you can browse huge matrixes, such as those often used in Information Retrieval.

If you have any questions, please contact Magnus Rosell (

^ to Human Language Technology group.

Responsible for this page: Viggo Kann <>
Latest change March 11, 2009
Technical support: <>