Grammar checking and proof reading
6-monthly report October 1999 to March 2000
This is the fourth and last 6-monthly report of the KTH part of the language
engineering project
Integrated language tools for writing and document handling.
Participants at KTH during this period
URL: http://www.nada.kth.se/theory/projects/granska/index.html
Fulfilling of milestones
Before starting the project we wrote down some
milestones and deliverables. We will now show
that almost all milestones have been fulfilled.
- Implementation of rule language
- The complete rule language, with the exception of regular expression
rules, has been implemented. An optimization scheme for the rule matching
has been created, which automatically finds out which part of a rule that
should trig a matching. This makes the rule matching six times faster than
before. Now 2800 words per second are processed by Granska.
The optimization is described in the report [Carlberger, Domeij, Kann, and Knutsson, 2000].
- Replacement proposals, user aspects
- Some user aspects on the use of Granska and the replacement proposals
have been evaluated and presented in [Hansén-Eriksson and Knutsson, 2000].
- Grammar checking rules construction
- A standard set of grammar checking rules has been created, covering
the error types we want Granska to detect
[Domeij and Knutsson, 1999].
In particular we have studied, implemented and evaluated rules for
split compounds and incongruence in nominal phrases, see
[Domeij, Knutsson, and Öhrman, 1999] and
[Domeij, Knutsson, Carlberger, and Kann, 1999].
The rules for NP detection have evaluated in
[Johansson, 2000].
- Tokenization
- The tokenization has been improved using the lexical analyzer Flex.
- Guessing word tags for unknown words
- The word tag guesser tags 93 % of the unknown words correctly.
The tagger tags 97 % of all words correctly.
A report describing the techniques used for tagging was published
last year [Carlberger and Kann, 1999].
- User interface implementation
- There now exist four user interfaces for Granska:
a text-based interface, a web interface (
http://www.nada.kth.se/theory/projects/granska/scrutinizer-web-demo.html
), a stand-alone
grammar checking editor with a graphical user interface (Windows),
and an add-in in Word (Windows).
- Linguistic search and editing
- A design specification for a linguistic editing function has not
been written yet.
Conferences and presentations
- October 22-23, 1999
- Rickard Domeij, Ola Knutsson and Lena Öhrman presented a paper about error types at Svenskans beskrivning in Linköping.
- December 3-5, 1999
- The project in cooperation with the Swedish language council
organized a conference at KTH.
- December 9-10, 1999
- Rickard Domeij and Ola Knutsson presented a paper about Granska at
NoDaLiDa-99 in Trondheim.
- August 24-26, 2000
- Johan Carlberger and Viggo Kann will present a paper about applications
of the tagger at Qualico-2000, the 4th Quantitative Linguistics Conference,
in Prague.
Publications
- A Swedish grammar checker
- J. Carlberger, R. Domeij, V. Kann, O. Knutsson
- submitted to Comp. Linguistics, April 2000.
- Some applications of a statistical tagger for Swedish
- J. Carlberger, V. Kann
- Qualico-2000, to appear, August 2000.
- Implementing an efficient part-of-speech tagger
- J. Carlberger, V. Kann
- Software Practice and Experience, 29, 815-832, 1999.
- Granska - an efficient hybrid system for Swedish grammar checking
- R. Domeij, O. Knutsson, J. Carlberger, V. Kann
- NoDaLiDa, December 1999.
- Inkongruens och felaktigt särskrivna sammansättningar - en beskrivning av två feltyper och möjligheten att detektera felen automatiskt
- R. Domeij, O. Knutsson, L. Öhrman
- Svenskans beskrivning, October 1999.
- Explorativ studie av språkgranskningsverktyg
- A. Hansén-Eriksson and O. Knutsson
- March, 2000.
- NP-detektion - utvärdering och förslag till förbättringar av Granskas NP-regler
- V. Johansson
- C level thesis in comp. linguistics, Department of Linguistics, Stockholm University, February 2000.
- Granskas regelspråk
- O. Knutsson
- Internal report, updated October 1999.
Up to Swedish grammar checking project.
Responsible for this page: Viggo Kann <viggo@nada.kth.se>
Latest change April 20, 2000
Technical support: <webmaster@nada.kth.se>