|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectmoj.lang.FrequencyList<Entry>
public class FrequencyList<Entry>
FrequencyList
holds a list of tokens (index terms) and their
respective term and document frequencies in the texts indexed (or 'added')
so far. Sublists can be extracted based on frequency thresholds.
Nested Class Summary | |
---|---|
class |
FrequencyList.CompareAlphabetically
For ordering words alphabetically. |
class |
FrequencyList.CompareByDF
For ordering words from lowest to highest document frequency. |
class |
FrequencyList.CompareByDFfalling
For ordering words from highest to lowest document frequency. |
class |
FrequencyList.CompareByTF
For ordering words from lowest to highest term frequency. |
class |
FrequencyList.CompareByTFfalling
For ordering words from highest to lowest term frequency. |
Constructor Summary | |
---|---|
FrequencyList()
Creates a new empty FrequencyList . |
Method Summary | |
---|---|
int |
addText(java.lang.String[] text)
Adds tokenized text to the FrequencyList and updates the
term and document frequencies for each encountered token. |
int |
addText(java.lang.String[] text,
int mintokenlen,
int maxtokenlen,
boolean prefixonly)
Adds tokenized text to the FrequencyList and updates the
term and document frequencies for each encountered token/prefix. |
FrequencyList<java.util.Map.Entry<java.lang.String,int[]>> |
filterByDFinside(int lowerBound,
int upperBound)
Extract sublist containing only the index terms with a document frequency between lowerBound and upperBound . |
FrequencyList<java.util.Map.Entry<java.lang.String,int[]>> |
filterByDFoutside(int lowerBound,
int upperBound)
Extract sublist containing only the index terms with a document frequency lower than lowerBound or higher than upperBound . |
FrequencyList<java.util.Map.Entry<java.lang.String,int[]>> |
filterByTFinside(int lowerBound,
int upperBound)
Extract sublist containing only the index terms with a term frequency between lowerBound and upperBound . |
FrequencyList<java.util.Map.Entry<java.lang.String,int[]>> |
filterByTFoutside(int lowerBound,
int upperBound)
Extract sublist containing only the index terms with a term frequency lower than lowerBound or higher than upperBound . |
int |
getDF(java.lang.String word)
Returns the document frequency for the given word. |
int |
getDocumentCount()
|
int |
getHighestDF()
|
int |
getHighestTF()
|
int |
getLowestDF()
|
int |
getLowestTF()
|
int |
getTF(java.lang.String word)
Returns the term frequency for the given word. |
java.util.Set<java.lang.String> |
getUniqueWords()
|
int |
getWordCount()
|
void |
getWordFrequency(java.lang.String[] words,
int[] tf,
int[] df)
Stores index terms and their corresponding term and document frequencies in the parallel array parameters. |
void |
getWordFrequency(java.lang.String[] words,
int[] tf,
int[] df,
java.util.Comparator<java.util.Map.Entry<java.lang.String,int[]>> comp)
Stores index terms and their corresponding term and document frequencies in the parallel array parameters. |
static void |
main(java.lang.String[] args)
Usage: moj.lang.FrequencyList <file> (<minimum token length>) <file> : file to build frequency counts on <minimum token length> : minimum length for a token for it to be counted |
java.lang.String |
toString()
Returns the frequency statistics sorted alphabetically. |
java.lang.String |
toString(java.util.Comparator<java.util.Map.Entry<java.lang.String,int[]>> comp)
Returns the frequency statistics using the supplied Comparator . |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public FrequencyList()
FrequencyList
.
Method Detail |
---|
public FrequencyList<java.util.Map.Entry<java.lang.String,int[]>> filterByTFinside(int lowerBound, int upperBound)
lowerBound
and upperBound
.
lowerBound
- lowest acceptable term frequencyupperBound
- highest acceptable term frequency
FrequencyList
containing the sublistpublic FrequencyList<java.util.Map.Entry<java.lang.String,int[]>> filterByDFinside(int lowerBound, int upperBound)
lowerBound
and upperBound
.
lowerBound
- lowest acceptable document frequencyupperBound
- highest acceptable document frequency
FrequencyList
containing the sublistpublic FrequencyList<java.util.Map.Entry<java.lang.String,int[]>> filterByTFoutside(int lowerBound, int upperBound)
lowerBound
or higher than upperBound
.
lowerBound
- lower term frequency boundaryupperBound
- higher term frequency boundary
FrequencyList
containing the sublistpublic FrequencyList<java.util.Map.Entry<java.lang.String,int[]>> filterByDFoutside(int lowerBound, int upperBound)
lowerBound
or higher than upperBound
.
lowerBound
- lower document frequency boundaryupperBound
- higher document frequency boundary
FrequencyList
containing the sublistpublic int getHighestTF()
public int getLowestTF()
public int getHighestDF()
public int getLowestDF()
public int getDocumentCount()
public int getWordCount()
public java.util.Set<java.lang.String> getUniqueWords()
Set
containing each unique word represented
in the frequency listpublic int getTF(java.lang.String word)
word
- the word (token) to get the term frequency for
public int getDF(java.lang.String word)
word
- the word (token) to get the document frequency for
public int addText(java.lang.String[] text)
FrequencyList
and updates the
term and document frequencies for each encountered token.
text
- tokenized text to be added
FrequencyList
public int addText(java.lang.String[] text, int mintokenlen, int maxtokenlen, boolean prefixonly)
FrequencyList
and updates the
term and document frequencies for each encountered token/prefix.
text
- tokenized text to be addedmintokenlen
- minimum length of a token for it to be added to the listmaxtokenlen
- maximum length of a token for it to be added to the list,
if maxtokenlen
< mintokenlen
then
maxtokenlen
= unlimited.prefixonly
- save only the maxtokenlen
number of characters
FrequencyList
public void getWordFrequency(java.lang.String[] words, int[] tf, int[] df)
getUniqueWords().size()
values.
The index terms are sorted alphabetically.
words
- array to be filled with the unique words that were encountered
in the source document(s)tf
- array to be filled with the term frequencies of the words at
corresponding index in the words arraydf
- array to be filled with the document frequencies of the words at
corresponding index in the words arraypublic void getWordFrequency(java.lang.String[] words, int[] tf, int[] df, java.util.Comparator<java.util.Map.Entry<java.lang.String,int[]>> comp)
getUniqueWords().size()
values.
The index terms are sorted according to the comparator comp
.
words
- array to be filled with the unique words that were encountered
in the source document(s)tf
- array to be filled with the term frequencies of the words at
corresponding index in the words arraydf
- array to be filled with the document frequencies of the words at
corresponding index in the words arraycomp
- Comparator
denoting how the parallell arrays
should be orderedpublic java.lang.String toString()
toString
in class java.lang.Object
String
containing the sorted entriespublic java.lang.String toString(java.util.Comparator<java.util.Map.Entry<java.lang.String,int[]>> comp)
Comparator
.
comp
- Comparator
to use for sorting entries
String
containing the sorted entriespublic static void main(java.lang.String[] args)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |