Nada

Grammar checking and proof reading

6-monthly report September 1998 to March 1999

This is the second 6-monthly report of the KTH part of the language engineering project Integrated language tools for writing and document handling.

Participants at KTH

URL: http://www.nada.kth.se/theory/projects/granska/index.html

Fulfilling of milestones

Before starting the project we wrote down some milestones and deliverables. We will now show that almost all milestones have been fulfilled.
Tagging
The tagger has been improved in several ways and now tags 97 % of the words correctly. A report describing the techniques used has been written and accepted for publication [Carlberger and Kann, 1999]. We have taken the initiative to announce a tagger competition that will be held later this year.
Implementation of rule language
Most of the rule language is now implemented. It has also been further extended with for instance a more advanced goto function and Stava integration. The interpretator of the rule language is now running under Unix, and will soon be incorporated into the new Windows version of Granska.
Grammar checking rules construction
The existing rules for the old version of Granska have been evaluated and rewritten in the syntax of the new rule language. Rules for split compounds have been constructed and evaluated.
Tokenization
The problem with split compounds has been addressed in a report [Öhrman, 1998]. The rules described in the report have been implemented and improved.
Guessing word tags for unknown words
The tagger tags about 95% of unknown compound words correctly, much better than 90% that was our goal.
User interface design and implementation
The new user interface to Granska has been implemented. The user interface and the grammar error detection module will soon be connected. Design of interface components for POS lexicon editing and replacement suggestions has not been done yet.
Stava/Granska integration
Stava is integrated into granska as a module that currently can be used only inside grammar checking rules.
Linguistic search and editing
An empirical study of revision patterns is currently being done.

New publications

Implementing an efficient part-of-speech tagger
J. Carlberger, V. Kann
Software Practice and Experience, to appear, 1999.
Postscript, PDF.
Felaktigt särskrivna sammansättningar
L. Öhrman
C level thesis in comp. linguistics, Department of linguistics, Stockholm University, October 1998.
Postscript, PDF.

^ Up to Swedish grammar checking project.


Responsible for this page: Viggo Kann <viggo@nada.kth.se>
Latest change September 24, 1999
Technical support: <webmaster@nada.kth.se>