moj.lang.se
Class GranskaConnection

java.lang.Object
  extended by moj.lang.se.GranskaConnection

public class GranskaConnection
extends java.lang.Object

GranskaConnection handles all communication with a Granska server. So far tokenizing, lemmatizing, tagging, parsing and grammar checking is only available for Swedish text input.

Version:
2007-Nov-02
Author:
Martin Hassel

Field Summary
 java.net.URL granskaURL
          URL to current Granska servlet
 java.net.URL gtaURL
          URL to current GTA servlet
 java.net.URL inflectorURL
          URL to current Inflector servlet
 
Constructor Summary
GranskaConnection()
          Create a new GranskaConnection to the Granska server given in the file Granska.properties.
GranskaConnection(java.lang.String filename)
          Create a new GranskaConnection to the Granska server given in the Properties file filename.
GranskaConnection(java.lang.String host, int port, java.lang.String path)
          Create a new GranskaConnection to the given host at the given port and path.
 
Method Summary
 java.io.Reader granskaConnect(java.lang.String text, java.net.URL servlet)
          Sends the text text to a Granska/GTA/Inflector servlet and returns an Reader "pointing" at the scrutinized text.
 java.lang.String inflect(java.lang.String word)
          Inflects the given word.
 java.lang.String inflect(java.lang.String word, java.lang.String wordclass)
          Inflects the given word according to the paradigm of the given wordclass.
 java.lang.String lemmaTag(java.lang.String text)
          Tokenizes, lemmatizes and PoS-tags the given text.
 java.lang.String lemmatize(java.lang.String text)
          Tokenizes and lemmatizes the given text.
static void main(java.lang.String[] args)
          Usage: moj.lang.se.GranskaConnection <TOKENIZE|LEMMATIZE|LEMMATAG|POSTAG|PARSE|INFLECT|DEMO> <file|word> (<word class>)
<TOKENIZE|LEMMATIZE|LEMMATAG|POSTAG|PARSE|INFLECT|DEMO> : keywords denoting desired function/output, or demo output
<file|word> : file to tag words in, or word to inflect
<word class> : inflection paradigm (if left out forms are generated for all PoS)
 org.xml.sax.InputSource parse(java.lang.String text)
          Sends the text text to a GTA server and returns an InputSource "pointing" at the parsed text.
 java.lang.String parseIOB(java.lang.String text)
          Partial shallow parsing of the given text.
 java.lang.String posTag(java.lang.String text)
          Tags the given text with morphosyntactic tags.
 org.xml.sax.InputSource scrutinize(java.lang.String text)
          Sends the text text to a Granska server and returns an InputSource "pointing" at the scrutinized text.
 java.lang.String simpleTag(java.lang.String text)
          Deprecated. 
 java.lang.String tokenize(java.lang.String text)
          Tokenizes the given text.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

inflectorURL

public final java.net.URL inflectorURL
URL to current Inflector servlet


granskaURL

public final java.net.URL granskaURL
URL to current Granska servlet


gtaURL

public final java.net.URL gtaURL
URL to current GTA servlet

Constructor Detail

GranskaConnection

public GranskaConnection()
                  throws java.net.MalformedURLException
Create a new GranskaConnection to the Granska server given in the file Granska.properties.

Throws:
java.net.MalformedURLException

GranskaConnection

public GranskaConnection(java.lang.String host,
                         int port,
                         java.lang.String path)
                  throws java.net.MalformedURLException
Create a new GranskaConnection to the given host at the given port and path.

Parameters:
host - Granska server host to connect to.
port - port on Granska server to connect to.
path - path to the servlets on the Granska server.
Throws:
java.net.MalformedURLException

GranskaConnection

public GranskaConnection(java.lang.String filename)
                  throws java.net.MalformedURLException
Create a new GranskaConnection to the Granska server given in the Properties file filename.

Throws:
java.net.MalformedURLException
Method Detail

granskaConnect

public java.io.Reader granskaConnect(java.lang.String text,
                                     java.net.URL servlet)
Sends the text text to a Granska/GTA/Inflector servlet and returns an Reader "pointing" at the scrutinized text. The returned XML-document contains information about lemmas and PoS-tags of words as well as possible grammar errors.

Parameters:
text - the text that is to be scrutinized by the Granska server.
servlet - servlet to connect to, i.e. Granska/GTA/Inflector.
Returns:
a Reader, for example to be passed to a XMLReader.

scrutinize

public org.xml.sax.InputSource scrutinize(java.lang.String text)
Sends the text text to a Granska server and returns an InputSource "pointing" at the scrutinized text. The returned XML-document contains information about lemmas and PoS-tags of words as well as possible grammar errors.

Parameters:
text - the text that is to be scrutinized by the Granska server.
Returns:
an InputSource, for example to be passed to a XMLReader.

parse

public org.xml.sax.InputSource parse(java.lang.String text)
Sends the text text to a GTA server and returns an InputSource "pointing" at the parsed text. The returned XML-document contains information about lemmas and PoS-tags of words as well as phrase structure information.

Parameters:
text - the text that is to be parsed by the GTA server.
Returns:
an InputSource, for example to be passed to a XMLReader.

tokenize

public java.lang.String tokenize(java.lang.String text)
Tokenizes the given text. The tokens (words, delimiters etc.) in the returned string are separated by space. For example, the text "Han springer fortare!" would yield the response "han springer fortare ! ". Note: Processes the text until first encountered newline, following text is skipped!

Parameters:
text - the text that is to be tokenized.
Returns:
the text tokenized, or null if tokenize fails.

lemmatize

public java.lang.String lemmatize(java.lang.String text)
Tokenizes and lemmatizes the given text. The tokens (words, delimiters etc.) in the returned string are separated by space. For example, the text "Han springer fortare!" would yield the response "han springa fort ! ". Note: Processes the text until first encountered newline, following text is skipped!

Parameters:
text - the text that is to be tokenized and lemmatized.
Returns:
the text tokenized and lemmatized, or null if lemmatize fails.

lemmaTag

public java.lang.String lemmaTag(java.lang.String text)
Tokenizes, lemmatizes and PoS-tags the given text. The tokens (words, delimiters etc.) in the returned string are separated by space. For example, the text "Han springer fortare!" would yield the response "han_pn springa_vb fort_ab !_mad ". Note: Processes the text until first encountered newline, following text is skipped!

Parameters:
text - the text that is to be tokenized, lemmatized and PoS-tagged.
Returns:
the text tokenized, lemmatized and PoS-tagged - or null if lemmaTag fails.

simpleTag

@Deprecated
public java.lang.String simpleTag(java.lang.String text)
Deprecated. 

Deprecated: Use posTag() instead, it offers the same functionality. This method may well be completely removed in future versions.


posTag

public java.lang.String posTag(java.lang.String text)
Tags the given text with morphosyntactic tags. The tokens (words, delimiters etc.) in the returned string are separated by space. For example, the text "Han springer fortare!" would yield the response "Han_pn.utr.sin.def.sub springer_vb.prs.akt fortare_ab.kom !_mad ". Note: Processes the text until first encountered newline, following text is skipped!

Parameters:
text - the text that is to be morphosyntacticly tagged.
Returns:
the text morphosyntacticly tagged - or null if posTag fails.

parseIOB

public java.lang.String parseIOB(java.lang.String text)
Partial shallow parsing of the given text. The format of the returned string is one token (word, delimiter etc.) per row with each form of analyzis divided by tab (\t), e.g.
"Händelsen nn.utr.sin.def.nom NPB CLB".

Parameters:
text - the text that is to be parsed
Returns:
the text parsed - or null if parseIOB fails.

inflect

public java.lang.String inflect(java.lang.String word)
Inflects the given word.

Parameters:
word - the word that is to be inflected.
Returns:
all inflections of the given word, no matter wordclass. Inflections are separated by space.

inflect

public java.lang.String inflect(java.lang.String word,
                                java.lang.String wordclass)
Inflects the given word according to the paradigm of the given wordclass.

Parameters:
word - the word that is to be inflected.
wordclass - the wordclass of the word that is to be inflected.
Returns:
all inflections of the given word according to the paradigm of the given wordclass. Inflections are separated by space.

main

public static void main(java.lang.String[] args)
Usage: moj.lang.se.GranskaConnection <TOKENIZE|LEMMATIZE|LEMMATAG|POSTAG|PARSE|INFLECT|DEMO> <file|word> (<word class>)
<TOKENIZE|LEMMATIZE|LEMMATAG|POSTAG|PARSE|INFLECT|DEMO> : keywords denoting desired function/output, or demo output
<file|word> : file to tag words in, or word to inflect
<word class> : inflection paradigm (if left out forms are generated for all PoS)