The sourcecode for Stava is available here (last published November 22, 2005). Lexicon Bloom filters and some useful lists of common names etc. are available here.
If you have any questions, please contact Viggo Kann (viggo@nada.kth.se).
The sourcecode for JavaSDM is available here and on-line documentation is available as JavaDoc. Additional information on how to run the random indexing package is available in a Readme file. Furthermore a Readmemore file explains how to use several of the classes in the JavaSDM package stand-alone; for example for decompounding, lemmatizing, tagging or for looking up term and/or document frequencies calculated over large Swedish corpora. A Java package containing a large number of Vector/Matrix similarity measures is available here, together with on-line documentation, also in JavaDoc.
If you have any questions, please contact Martin Hassel (xmartin@nada.kth.se).
The sourcecode for Stomp is available here (last published November 22, 2005). For more information, see this article. Note: In this implementation the handling of numerical values is very bad (all numerical values are expected by Stomp to have been replaced with the string "4711"). It should be quite straightforward to improve this handling, though.
If you have any questions, please contact Jonas Sjöbergh (jsh@nada.kth.se).
The sourcecode for a server implementation of Compound Splitter is available here (Last published March 11, 2009). For more information, see this article.
If you have any questions, please contact Jonas Sjöbergh (jsh@nada.kth.se).
The sourcecode for Granska Tagger is available here (last published March 10, 2009) and a version for amd64 (last published November 5, 2010, thanks to Robert Östling). For more information, see the following article. To be useful Granska Tagger requires various lexicon files which are available here.
If you have any questions, please contact Viggo Kann (viggo@nada.kth.se).
The sourcecode for Inflector is available here (Last published December 30, 2005). To be useful Inflector requires the same lexicon files as Granska Tagger.
Unfortunately we will not be able to answer any questions regarding this tool.
The source code for AutoEval and Missplel (last published November 22, 2005). The code may not be used for commercial purposes. The source code is best compiled with gcc/g++ 3.4.4 and requires Xerces, Boost and zlib. There is also a graphical user interface available that requires QT. Also, you should take a look at the lexicon files (e.g. the file cwtl) used by Granska Tagger if you don't have access to your own tagged corpus (this will only be useful for producing spelling errors in Swedish, however).
If you have any questions, please contact Johnny Bigert (johnny@kth.se).
Infomat is available here. Infomat is a Vector Space Visualization Tool. With it you can browse huge matrixes, such as those often used in Information Retrieval.
If you have any questions, please contact Magnus Rosell (rosell@csc.kth.se).
to Human Language Technology group.