Nada

Grammar checking and proof reading

6-monthly report March 1998 to August 1998

This is the first 6-monthly report of the KTH part of the language engineering project Integrated language tools for writing and document handling.

Participants at KTH

URL: http://www.nada.kth.se/theory/projects/granska/index.html

Fulfilling of milestones

Before starting the project we wrote down some milestones and deliverables. We will now show that all milestones have been fulfilled.
Design and implementation of probabilistic model
A probabilistic part-of-speech tagger and tag disambiguator has been constructed in C++. It tags 96% of all words correctly, using the SUC tag set. The speed of the tagger is impressive: more than 20 000 words per second can be tagged. The performance of the tagger is better than our goals. The tagger can be run from he web page of the project. A report has been written and submitted to a journal.
Extended rule language design
The language for describing grammar errors used in the old version of Granska has been extended to a powerful object-oriented language. The language is defined in an internal report.
Implementation of rule language
A parser for the rule language has been implemented using Lex, Yacc and C++.
Grammar checking rules construction
The existing rules for the old version of Granska have been improved, structured and documented in an internal report.
Tokenization
A simple tokenizer is built into the tagger.
Lexicon work
A lexicon containing he SUC words (including the lemma), SAOL 11, and style words has been created and optimized for fast loading and lookup. During the work lots of errors in SUC and SAOL have been found.
Guessing word tags for unknown words
The tagger tags 88% of unknown words correctly, much better than the 70% that was our goal.
User interface design and implementation
A new user interface to Granska has been designed and implemented, and was demonstrated at the science festival in May. The design and the results of a user study are described in a report.
Stava/Granska integration
The user interface for the spell checking and correction is also described in the same report.
Linguistic search and editing
A linguistic search function has been designed and implemented in Granska.

Publications and documentation

Popular description of Granska
Written for the science festival in Gothenburg 1998-05-09.
Implementing an efficient part-of-speech tagger
J. Carlberger, V. Kann
Submitted, August 1998.
Postscript, PDF.
Granskas nya regelspråk
O. Knutsson
Internal report, August 1998.
Granskaprojektet: Rapport från arbetet med granskningsregler och kommentarer
R. Domeij, O. Knutsson
Internal report, August 1998.
Postscript, PDF.
Interaktivitet och användbarhet vid datorstödd språkgranskning och redigering i en integrerad skrivmiljö
S. Larsson
Masters thesis TRITA-NA-E9833 (IPLab-150), Nada, juni 1998.

^ Up to Swedish grammar checking project.


Responsible for this page: Viggo Kann <viggo@nada.kth.se>
Latest change September 24, 1999
Technical support: <webmaster@nada.kth.se>