Nada

Grammar checking and proof reading

6-monthly report April 1999 to September 1999

This is the third 6-monthly report of the KTH part of the language engineering project Integrated language tools for writing and document handling. The 6-monthly report of the Göteborg part of the project can be found here in RTF and Word format.

Participants at KTH during this period

URL: http://www.nada.kth.se/theory/projects/granska/index.html

Fulfilling of milestones

Before starting the project we wrote down some milestones and deliverables. We will now show that almost all milestones have been fulfilled.
Lexicon work
Morphological rules for inflection of any word in the lexicon have been constructed and optimized. We had hoped to be able to use SAOL 12 for this but Svenska Akademien has still, after a year, not come to a decision on whether we could be allowed to use it.
Guessing word tags for unknown words
The word tag guesser has been improved in several ways and now tags 91 % of the words correctly, much better than our goal of 85 %. Our report describing the techniques used for tagging has recently been published [Carlberger and Kann, 1999].
Replacement proposals
Algorithms for generation of replacement proposals have been implemented. This includes generation of spelling error replacement proposals and grammar error replacement proposals, but not yet any order of precedence for the proposals when several proposals are given for the same spelling or grammar error. Replacement proposal generation rules have been written for almost all grammar-checking rules.
Grammar checking rules construction
A specification of error types that we want Granska to detect has been written [Domeij and Knutsson, 1999]. We have compared the list to similar lists for commercial Swedish grammar checkers and found that our list covers most errors and contains error types that are not detected by other grammar checkers, for example split compounds. In particular we have studied, implemented and evaluated rules for two error types: split compounds and incongruence in nominal phrases. A report will be presented [Domeij, Knutsson, and Öhrman, 1999].
User interface design and implementation
Connecting the graphical user interface and the grammar error detection module has been unexpectedly hard, but progress is done. Design of interface components for POS lexicon editing and replacement suggestions will not be done in this project. A text-based interface to Granska has been implemented and connected to the grammar error detection module. A web interface is under construction, see http://www.nada.kth.se/theory/projects/granska/scrutinizer-web-demo.html
Linguistic search and editing
An empirical study of revision patterns has been performed and a report has been written [Tyndall, 1999].
Swedish language rules and help system
A draft of the new version of Svenska skrivregler has been completed. The design of the help system in HTML has been specified.

Conferences and presentations

March 22
The project organized Temadag om datorstödd språkgranskning at KTH. The Granska project was presented in talks by Kerstin Severinson-Eklundh, Rickard Domeij, Viggo Kann, Ola Knutsson and Johan Carlberger. There was an audience of about 90 persons.
April 26
Kerstin Severinson-Eklundh, Rickard Domeij, Viggo Kann and Ola Knutsson presented the Granska project in Lund at the HSFR language technology program meeting.
September 6
Johan Carlberger presented the Granska tagger and grammar checking system at a seminar at Stockholm university.
September 21
Kerstin Severinson-Eklundh and Ola Knutsson presented the project at the Department of Linguistics, University of Göteborg.
October 22-23
Rickard Domeij, Ola Knutsson and Lena Öhrman will present a paper about error types at Svenskans beskrivning in Linköping.
December 3-5
The project in cooperation with the Swedish language council will organize a conference at KTH.

New publications

Implementing an efficient part-of-speech tagger
J. Carlberger, V. Kann
Software Practice and Experience, 29, 815-832, 1999.
Specifikation av grammatiska feltyper i Granska
R. Domeij, O. Knutsson
Internt arbetspapper, Nada, September 1999.
HTML
Inkongruens och felaktigt särskrivna sammansättningar - en beskrivning av två feltyper och möjligheten att detektera felen automatiskt
R. Domeij, O. Knutsson, L. Öhrman
Svenskans beskrivning, October 1999.
HTML.
Granska - ett effektivt hybridsystem för kontroll av svensk grammatik
R. Domeij, O. Knutsson, J. Carlberger, V. Kann
Submitted to NoDaLiDa, December 1999.
HTML.
Datorstöd för lingvistisk redigering - en förstudie
A. Tyndall
Masters thesis, Department of Linguistics, Stockholm University, June 1999.
Postscript, PDF.

^ Up to Swedish grammar checking project.


Responsible for this page: Viggo Kann <viggo@nada.kth.se>
Latest change September 30, 1999
Technical support: <webmaster@nada.kth.se>