| 
 | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectmoj.lang.StopList
public class StopList
A StopList holds a set of index terms that are to be stripped
 away from a text (i.e. "stopped") before it is further processed. It also
 supports thresholds in form of minimum and maximum allowed length of index
 terms as well as minimum number of tokens ("words") a document must
 hold in order to be processed.
 
 These thresholds can be set in a properties file. The properties file can
 contain any of the following items:
 stoplist_file = <stopword file>
 shortest_word = <length in characters>
 longest_word = <length in characters>
 minimum_words_per_file = <length in words>
 
 The file containing the list of stop words should have one stop word per 
 line, and stop words should be in the first column if several columns are 
 used for different data (tab "\t" is used as column separator.
 
| Constructor Summary | |
|---|---|
| StopList()Creates a StopListusing the default properties file 'StopList.properties' | |
| StopList(SLprops properties)Creates a StopListusing the stop list propertiesproperties | |
| StopList(java.lang.String propertiesFile)Creates a StopListusing the properties filepropertiesFile | |
| Method Summary | |
|---|---|
|  void | addStopWord(java.lang.String stopword)Add stop word to the StopLists internal list of stop words. | 
|  void | addStopWords(java.lang.String[] stopwords)Add a set of stop words from an array containing exactly one stop word per element. | 
|  SLprops | getProperties()Gets the Propertiesfor theStopList | 
|  java.util.Set<java.lang.String> | getStopWords()Return the current Setof stop words to be removed from a text. | 
| static void | main(java.lang.String[] args)Usage: moj.lang.StopList <file> <file> : file to remove stopwords from (<properties>) : properties file The properties file can contain any of the following items: stoplist_file = <stopword file> shortest_word = <length in characters> longest_word = <length in characters> minimum_words_per_file = <length in words> | 
|  java.lang.String | removeStopWords(java.lang.String text)Removes stop words in StopListfrom theString text | 
|  java.lang.String[] | removeStopWords(java.lang.String[] text)Removes stop words in StopListfrom theStringarraytext. | 
|  java.lang.String[] | removeStopWords(java.lang.String[] text,
                boolean verbose)Removes stop words in StopListfrom theStringarraytextand, optionally, output the number of removed words 
 toSystem.out. | 
|  java.lang.String | removeStopWords(java.lang.String text,
                boolean verbose)Removes stop words in StopListfrom theString textand output the number of removed words toSystem.out | 
|  java.util.Set<java.lang.String> | setStopWords(java.util.HashSet<java.lang.String> stopwords)Sets the set of stop words to the provided HashSet. | 
| Methods inherited from class java.lang.Object | 
|---|
| equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
| Constructor Detail | 
|---|
public StopList()
StopList using the default properties file 'StopList.properties'
public StopList(SLprops properties)
StopList using the stop list properties properties
public StopList(java.lang.String propertiesFile)
StopList using the properties file propertiesFile
| Method Detail | 
|---|
public SLprops getProperties()
Properties for the StopList
Properties for the StopListpublic java.util.Set<java.lang.String> setStopWords(java.util.HashSet<java.lang.String> stopwords)
HashSet.
stopwords - HashSet containing the stop words to be
        removed from a text.
Set containing the previous set of stop words.public java.util.Set<java.lang.String> getStopWords()
Set of stop words to be removed from a text.
Set containing the current set of stop words.public void addStopWord(java.lang.String stopword)
stopword - stop word to addpublic void addStopWords(java.lang.String[] stopwords)
stopwords - stop words to be added to the StopListpublic java.lang.String removeStopWords(java.lang.String text)
StopList from the String text
text - the text which is to have stop words removed
text with stop words removed
public java.lang.String removeStopWords(java.lang.String text,
                                        boolean verbose)
StopList from the String text
 and output the number of removed words to System.out
text - the text which is to have stop words removedverbose - output the number of removed words to System.out
        (true/false)
text with stop words removedpublic java.lang.String[] removeStopWords(java.lang.String[] text)
StopList from the String 
 array text.
text - the text which is to have stop words removed
text with stop words removed
public java.lang.String[] removeStopWords(java.lang.String[] text,
                                          boolean verbose)
StopList from the String 
 array text and, optionally, output the number of removed words 
 to System.out. All processed words are transformed to lower case 
 and some cleanup is attempted (i.e. removing non-alphanumeric characters) 
 before they are checked against the filtering criteria (e.g. inclusion in 
 the list of stop words, word length constraints etc).
text - the text which is to have stop words removedverbose - output the number of removed words to System.out
        (true/false)
text with stop words removedpublic static void main(java.lang.String[] args)
| 
 | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||