moj.lang.en
Class PorterStemmer

java.lang.Object
  extended by moj.lang.en.PorterStemmer

public class PorterStemmer
extends java.lang.Object

For a definition of the Porter stemmer, see An algorithm for suffix stripping, Program, Vol 14 no 3 pp 130-137, July 1980. Also, the official implementation(s) can be found at: http://www.tartarus.org/~martin/PorterStemmer/

Version:
2005-Dec-06

Constructor Summary
PorterStemmer()
           
 
Method Summary
static void main(java.lang.String[] args)
          Usage: moj.lang.en.PorterStemmer <file>
<file> : file to stem the words in
 java.lang.String removeNonWordChars(java.lang.String str)
          Remove all characters except letters and digits from the given string.
 java.lang.String stripAffixes(java.lang.String word)
          Removes all non-word characters from the given word and strips it (if possible) from English pre- and suffixes.
 java.lang.String stripPrefixes(java.lang.String str)
           
 java.lang.String stripSuffixes(java.lang.String word)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PorterStemmer

public PorterStemmer()
Method Detail

removeNonWordChars

public java.lang.String removeNonWordChars(java.lang.String str)
Remove all characters except letters and digits from the given string.

Parameters:
str - String to be cleansed from non-word characters
Returns:
a String cleansed from non-word characters

stripPrefixes

public java.lang.String stripPrefixes(java.lang.String str)
Parameters:
str - the (possibly conjugated) word that is to be stripped from (a few) English prefixes
Returns:
the word stripped (if possible) from English prefixes

stripSuffixes

public java.lang.String stripSuffixes(java.lang.String word)
Parameters:
word - the (possibly conjugated) word that is to be stemmed (e.g. in this case stripped from English suffixes)
Returns:
the word stripped (if possible) from English suffixes

stripAffixes

public java.lang.String stripAffixes(java.lang.String word)
Removes all non-word characters from the given word and strips it (if possible) from English pre- and suffixes.

Parameters:
word - the (possibly conjugated) word that is to be stemmed (e.g. in this case stripped from English prefixes and suffixes)
Returns:
the word stripped (if possible) from English prefixes and suffixes

main

public static void main(java.lang.String[] args)
Usage: moj.lang.en.PorterStemmer <file>
<file> : file to stem the words in