Grammar checking and proof reading
6-monthly report March 1998 to August 1998
This is the first 6-monthly report of the KTH part of the language
engineering project
Integrated language tools for writing and document handling.
Participants at KTH
URL: http://www.nada.kth.se/theory/projects/granska/index.html
Fulfilling of milestones
Before starting the project we wrote down some
milestones and deliverables. We will now show
that all milestones have been fulfilled.
- Design and implementation of probabilistic model
- A probabilistic part-of-speech tagger and tag disambiguator has
been constructed in C++. It tags 96% of all
words correctly, using the SUC tag set. The speed of the tagger is
impressive: more than 20 000 words per second can be tagged. The
performance of the tagger is better than our goals.
The tagger can be run from he web page of the project.
A report has been written and submitted to a journal.
- Extended rule language design
- The language for describing grammar errors used in the old version of
Granska has been extended to a powerful object-oriented language. The
language is defined in an internal report.
- Implementation of rule language
- A parser for the rule language has been implemented using Lex, Yacc
and C++.
- Grammar checking rules construction
- The existing rules for the old version of Granska have been
improved, structured and documented in an internal report.
- Tokenization
- A simple tokenizer is built into the tagger.
- Lexicon work
- A lexicon containing he SUC words (including the lemma), SAOL 11,
and style words has been created and optimized for fast loading and
lookup. During the work lots of errors in SUC and SAOL have been found.
- Guessing word tags for unknown words
- The tagger tags 88% of unknown words correctly, much better than the
70% that was our goal.
- User interface design and implementation
- A new user interface to Granska has been designed and implemented, and
was demonstrated at the science festival in May. The design and the results
of a user study are described in a report.
- Stava/Granska integration
- The user interface for the spell checking and correction is also
described in the same report.
- Linguistic search and editing
- A linguistic search function has been designed and implemented in
Granska.
Publications and documentation
- Popular description of Granska
- Written for the science festival in Gothenburg 1998-05-09.
- Implementing an efficient part-of-speech tagger
- J. Carlberger, V. Kann
- Submitted, August 1998.
-
Postscript,
PDF.
- Granskas nya regelspråk
- O. Knutsson
- Internal report, August 1998.
- Granskaprojektet: Rapport från arbetet med granskningsregler och kommentarer
- R. Domeij, O. Knutsson
- Internal report, August 1998.
-
Postscript,
PDF.
- Interaktivitet och användbarhet vid datorstödd språkgranskning och redigering i en integrerad skrivmiljö
- S. Larsson
- Masters thesis TRITA-NA-E9833 (IPLab-150), Nada, juni 1998.
Up to Swedish grammar checking project.
Responsible for this page: Viggo Kann <viggo@nada.kth.se>
Latest change September 24, 1999
Technical support: <webmaster@nada.kth.se>