moj.lang
Class StandardDeviationList

java.lang.Object
  extended by moj.lang.StandardDeviationList

public class StandardDeviationList
extends java.lang.Object

StandardDeviationList holds a list of unique tokens (index terms) and their respective standard deviation of the distance between the token instances' positions in the text processed by the list. This can e.g. be used for keyword extraction as a high standard deviation indicates a high "burstiness" for the token's distribution in the text (i.e. a good keyword indicator for longer texts). For more information see "Keyword detection in natural languages and DNA" (Ortuno et al 2002).

Version:
2006-Oct-23
Author:
Martin Hassel

Constructor Summary
StandardDeviationList()
          Creates a new empty StandardDeviationList.
 
Method Summary
 int addText(java.lang.String[] text)
          Adds tokenized text to the StandardDeviationList and takes notes of every tokens position(s) in the text.
 int addText(java.lang.String[] text, int mintokenlen)
          Adds tokenized text to the StandardDeviationList and takes notes of every tokens position(s) in the text.
 java.util.HashMap<java.lang.String,java.lang.Double> getStandardDeviations()
          Gets a HashMap containing the standard deviations of the distances between each instance of each unique token in the added text(s).
static void main(java.lang.String[] args)
          Usage: moj.lang.StandardDeviationList <file> (<minimum token length>)
<file> : file to build standard deviations on
<minimum token length> : minimum length for a token for it to be counted
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StandardDeviationList

public StandardDeviationList()
Creates a new empty StandardDeviationList.

Method Detail

getStandardDeviations

public java.util.HashMap<java.lang.String,java.lang.Double> getStandardDeviations()
Gets a HashMap containing the standard deviations of the distances between each instance of each unique token in the added text(s).

Returns:
HashMap containing standard deviations of token distances

addText

public int addText(java.lang.String[] text)
Adds tokenized text to the StandardDeviationList and takes notes of every tokens position(s) in the text.

Parameters:
text - tokenized text to be added

addText

public int addText(java.lang.String[] text,
                   int mintokenlen)
Adds tokenized text to the StandardDeviationList and takes notes of every tokens position(s) in the text.

Parameters:
text - tokenized text to be added
mintokenlen - minimum length of a token for it to be added to the list

main

public static void main(java.lang.String[] args)
Usage: moj.lang.StandardDeviationList <file> (<minimum token length>)
<file> : file to build standard deviations on
<minimum token length> : minimum length for a token for it to be counted