Milestones and deliverables

1998-09-01

1999-03-01

1999-09-01

2000-03-01

Design and implementation of probabilistic model

The probabilistic model will be implemented. A separate tagging and disambiguation program will be completed. The goal is to tag a test text (without unknown words) correctly to 95% with no ambiguity and to 98% with ambiguity. The possibilities for finding errors just from tag statistics will be investigated. The design and some ideas of the implementation will be described in an internal report. (Johan, Viggo)

 

 

 

Extended rule language design

The extended rule language will be specified and formalized in an internal report (Ola, Rickard, Viggo).

 

 

 

Implementation of rule language

A parser for the rule language will be implemented. (Viggo?)

Rule priorities, extended regular expressions and new probabilistic means to deal with ambiguity in the rule language will be implemented. (Viggo, Stefan)

 

The complete rule language will be implemented and the searching will be optimized. A report describing the implementation and the obtained optimization will be written. (Viggo, ?)

Replacement proposals

 

A model for replacement proposals will be specified in an internal report. The model should describe linguistic and user aspects of the replacement proposals; the way they should be integrated in the rule language, and how they should be generated and presented to the user (Ola, Rickard).

1. Implementation of algorithms for generation of replacement proposals (see also lexicon work, Viggo?).

2. The grammar-checking rules should be able to give replacement proposals in the Granska system (Ola, Rickard).

Evaluation of the replacement proposals; user aspects and linguistic correctness (Ola, Rickard).

Grammar checking rules construction and error analysis

Work on existing grammar checking rules and comments to improve usability should be finished and documented. (Ola, Rickard).

1. Error corpora for analysis and evaluation should be collected (Ola, Rickard).

2. Existing grammar checking rules evaluated, with regard to user aspects (Ola, Rickard).

3. The problem with split compounds will be addressed in a report (see also tokenization, ex-jobbare).

1. Transduction of existing grammar checking rules to the new extended rule language (Ola, Rickard).

2. Empirical studies of grammar errors will be presented in a report. (Rickard)

3. A specification of errors we intend to detect and correct with Granska will be presented in a report. (Ola, Rickard).

1. Implementation of new grammar checking rules using the power of the extended rule language (Ola, Rickard).

2. Evaluation of grammar checking and proofreading in the Granska system. (Ola, Rickard).

Milestones and deliverables

1998-09-01

1999-03-01

1999-09-01

2000-03-01

Tokenization

A simple tokenizer will be implemented. (Johan)

The problem with split compounds will be addressed in a report. (exjobbare)

 

An improved tokenizer will be implemented. (?)

Lexicon work

A POS lexicon with the words in the SUC material will be implemented optimized for fast loading and searching.(Johan)

 

The POS lexicon will be extended with SAOL 12. Inflections should be able to construct from the lexicon. (Viggo, ?)

 

Guessing word tags for unknown words

A simple algorithm for word tag guessing will be implemented. It should tag unknown words correctly to 70%. (Johan)

Tagging of compound words will be implemented. Compound words where all parts of the words are included in the lexicon should be tagged correctly to 90%. (Viggo)

An improved algorithm for word tag guessing will be implemented. The algorithm will be described in an internal report. It should tag unknown words correctly to 85%. Parts of the Stava program will be used. (Viggo?, exjobbare?)

The word tag guessing will be evaluated and a report written. (Rickard, Ola, Viggo)

User interface design

1. Design of the overall user interface model, especially components pertaining to user interaction, user customization, and Swedish language rules (Stefan).

2. Study of user dialogue and interaction based on prototype implementation (Stefan).

Design of additional user interface components for POS lexicon editing and replacement suggestions (Stefan)

 

 

User interface implementation

Implementation of basic components in the user interface model (Stefan).

Implementation of the full user interface model (Stefan).

Implementation of additional user interface components (Stefan).

 

Stava/Granska integration

The spell checking and correction user interface will be described and evaluated in a report. (Stefan)

The spell checking will be implemented in Granska. (Stefan, Viggo)

The spelling correction will be implemented in Granska (Stefan)

 

Linguistic search and editing

Implementation of a linguistic search function (Stefan)

Empirical study of revision patterns (preliminary results) (Py?)

 

Design specification for linguistic editing function (Rickard, Ola?)

Swedish language rules and help system

 

 

 

 

Finite state grammar for finding grammatical errors in Swedish text

Preliminary version of the dictionary, based on Lexin. Report on relationship to this project of Granska and work on finite-state grammar for Swedish at Umeå.

A system that finds errors in noun phrases and selection errors, together with a report documenting it.

Further development of the system to cover verb tense phenomena and word order phenomena, together with a report documenting it.

Evaluation of the system. Attempts to extend to cover missing sentence boundaries. If successful an extension of the system to cover these cases. A report documenting the system or the attempts we have made.