Package infomat.vectorspace

"infomat.vectorspace" is the central data structure package of infomat.

See:
          Description

Class Summary
HierarchicalIObjectGrouping A HierarchichalIObjectGrouping is a IObjectGrouping with special properties.
ILabel A label for any IObject.
IMatrix IMatrix represents the relations between two IObjectSet:s.
IMatrixCell An element in a IMatrix.
IMatrixCellFilter Removes IMatrixCells, rows, and columns depending on their frequency.
IMatrixProjector  
IObject A IObject is either an object or a feature.
IObjectGroup A IObjectGroup is a set of IObject:s.
IObjectGrouping A IObjectGrouping is a set of IObjectGroup:s.
IObjectGroupNode A IObjectGroupNode is a node in a HierarchicalIObjectGrouping.
IObjectLexicalComparator A private class that is used to order the IObjects in lexical order.
IObjectSet A IObjectSet is a complete set of IObject:s.
MeasureSorter A class for sorting of IObjectGroupings in order of associated Measure:s.
Normalizer For normalizeing rows or columns of a IMatrix.
Sorter A class for sorting of IObject:s.
Stoplist A stoplist.
 

Package infomat.vectorspace Description

"infomat.vectorspace" is the central data structure package of infomat.

Infomat deals with objects that are called IObjects. Each IObject has a string and an id number that uniquely identifies it. It has also, when applicable, a reference to a location where the actual object is stored (like a actual text file). In this manual they often will be called objects, for short.

Several IObjects can be stored in an IObjectGroup, and several IObjectGroups consitutes an IObjectGrouping. Through this manual these are also called groups and groupings for short. Right now each IObject can belong to only one IObjectGroup in every IObjectGrouping.

The main data structure in Infomat is a matrix, called an IMatrix. It is an implementation of a sparse matrix. The objects along the axes of the matrix, rows and columns, corresonds to IObjects. Each axes has a special IObjectGroup called an IObjectSet. The IObjectSet also keeps track of all IObjectGroupings of it. An IObjectGrouping can only contain IObjects from one IObjectSet.

The IMatrix stores several IMatrixCells which holds information of the relation between two IObjects, one from each IObjectSet. The basic information is a count, and a derived information is called a weight.

For a typical Information Retrieval scenario the row IObjects may consitute texts, with titles and locations in the file system, and the columns words that appear in the texts. For each word that appear in a particular text an IMatrixCell with the number of appearances is stored as the count. The weight of the IMatrixCell can be calculated through a weighting scheme. An IObjectGrouping of the texts (rows) could be a clustering or a categorization of the texts.