|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectmoj.lang.FrequencyList<Entry>
public class FrequencyList<Entry>
FrequencyList holds a list of tokens (index terms) and their
respective term and document frequencies in the texts indexed (or 'added')
so far. Sublists can be extracted based on frequency thresholds.
| Nested Class Summary | |
|---|---|
class |
FrequencyList.CompareAlphabetically
For ordering words alphabetically. |
class |
FrequencyList.CompareByDF
For ordering words from lowest to highest document frequency. |
class |
FrequencyList.CompareByDFfalling
For ordering words from highest to lowest document frequency. |
class |
FrequencyList.CompareByTF
For ordering words from lowest to highest term frequency. |
class |
FrequencyList.CompareByTFfalling
For ordering words from highest to lowest term frequency. |
| Constructor Summary | |
|---|---|
FrequencyList()
Creates a new empty FrequencyList. |
|
| Method Summary | |
|---|---|
int |
addText(java.lang.String[] text)
Adds tokenized text to the FrequencyList and updates the
term and document frequencies for each encountered token. |
int |
addText(java.lang.String[] text,
int mintokenlen,
int maxtokenlen,
boolean prefixonly)
Adds tokenized text to the FrequencyList and updates the
term and document frequencies for each encountered token/prefix. |
FrequencyList<java.util.Map.Entry<java.lang.String,int[]>> |
filterByDFinside(int lowerBound,
int upperBound)
Extract sublist containing only the index terms with a document frequency between lowerBound and upperBound. |
FrequencyList<java.util.Map.Entry<java.lang.String,int[]>> |
filterByDFoutside(int lowerBound,
int upperBound)
Extract sublist containing only the index terms with a document frequency lower than lowerBound or higher than upperBound. |
FrequencyList<java.util.Map.Entry<java.lang.String,int[]>> |
filterByTFinside(int lowerBound,
int upperBound)
Extract sublist containing only the index terms with a term frequency between lowerBound and upperBound. |
FrequencyList<java.util.Map.Entry<java.lang.String,int[]>> |
filterByTFoutside(int lowerBound,
int upperBound)
Extract sublist containing only the index terms with a term frequency lower than lowerBound or higher than upperBound. |
int |
getDF(java.lang.String word)
Returns the document frequency for the given word. |
int |
getDocumentCount()
|
int |
getHighestDF()
|
int |
getHighestTF()
|
int |
getLowestDF()
|
int |
getLowestTF()
|
int |
getTF(java.lang.String word)
Returns the term frequency for the given word. |
java.util.Set<java.lang.String> |
getUniqueWords()
|
int |
getWordCount()
|
void |
getWordFrequency(java.lang.String[] words,
int[] tf,
int[] df)
Stores index terms and their corresponding term and document frequencies in the parallel array parameters. |
void |
getWordFrequency(java.lang.String[] words,
int[] tf,
int[] df,
java.util.Comparator<java.util.Map.Entry<java.lang.String,int[]>> comp)
Stores index terms and their corresponding term and document frequencies in the parallel array parameters. |
static void |
main(java.lang.String[] args)
Usage: moj.lang.FrequencyList <file> (<minimum token length>) <file> : file to build frequency counts on <minimum token length> : minimum length for a token for it to be counted |
java.lang.String |
toString()
Returns the frequency statistics sorted alphabetically. |
java.lang.String |
toString(java.util.Comparator<java.util.Map.Entry<java.lang.String,int[]>> comp)
Returns the frequency statistics using the supplied Comparator. |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
|---|
public FrequencyList()
FrequencyList.
| Method Detail |
|---|
public FrequencyList<java.util.Map.Entry<java.lang.String,int[]>> filterByTFinside(int lowerBound,
int upperBound)
lowerBound and upperBound.
lowerBound - lowest acceptable term frequencyupperBound - highest acceptable term frequency
FrequencyList containing the sublist
public FrequencyList<java.util.Map.Entry<java.lang.String,int[]>> filterByDFinside(int lowerBound,
int upperBound)
lowerBound and upperBound.
lowerBound - lowest acceptable document frequencyupperBound - highest acceptable document frequency
FrequencyList containing the sublist
public FrequencyList<java.util.Map.Entry<java.lang.String,int[]>> filterByTFoutside(int lowerBound,
int upperBound)
lowerBound or higher than upperBound.
lowerBound - lower term frequency boundaryupperBound - higher term frequency boundary
FrequencyList containing the sublist
public FrequencyList<java.util.Map.Entry<java.lang.String,int[]>> filterByDFoutside(int lowerBound,
int upperBound)
lowerBound or higher than upperBound.
lowerBound - lower document frequency boundaryupperBound - higher document frequency boundary
FrequencyList containing the sublistpublic int getHighestTF()
public int getLowestTF()
public int getHighestDF()
public int getLowestDF()
public int getDocumentCount()
public int getWordCount()
public java.util.Set<java.lang.String> getUniqueWords()
Set containing each unique word represented
in the frequency listpublic int getTF(java.lang.String word)
word - the word (token) to get the term frequency for
public int getDF(java.lang.String word)
word - the word (token) to get the document frequency for
public int addText(java.lang.String[] text)
FrequencyList and updates the
term and document frequencies for each encountered token.
text - tokenized text to be added
FrequencyList
public int addText(java.lang.String[] text,
int mintokenlen,
int maxtokenlen,
boolean prefixonly)
FrequencyList and updates the
term and document frequencies for each encountered token/prefix.
text - tokenized text to be addedmintokenlen - minimum length of a token for it to be added to the listmaxtokenlen - maximum length of a token for it to be added to the list,
if maxtokenlen < mintokenlen then
maxtokenlen = unlimited.prefixonly - save only the maxtokenlen number of characters
FrequencyList
public void getWordFrequency(java.lang.String[] words,
int[] tf,
int[] df)
getUniqueWords().size() values.
The index terms are sorted alphabetically.
words - array to be filled with the unique words that were encountered
in the source document(s)tf - array to be filled with the term frequencies of the words at
corresponding index in the words arraydf - array to be filled with the document frequencies of the words at
corresponding index in the words array
public void getWordFrequency(java.lang.String[] words,
int[] tf,
int[] df,
java.util.Comparator<java.util.Map.Entry<java.lang.String,int[]>> comp)
getUniqueWords().size() values.
The index terms are sorted according to the comparator comp.
words - array to be filled with the unique words that were encountered
in the source document(s)tf - array to be filled with the term frequencies of the words at
corresponding index in the words arraydf - array to be filled with the document frequencies of the words at
corresponding index in the words arraycomp - Comparator denoting how the parallell arrays
should be orderedpublic java.lang.String toString()
toString in class java.lang.ObjectString containing the sorted entriespublic java.lang.String toString(java.util.Comparator<java.util.Map.Entry<java.lang.String,int[]>> comp)
Comparator.
comp - Comparator to use for sorting entries
String containing the sorted entriespublic static void main(java.lang.String[] args)
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||