-- 20newsgroupsExample.zip -- This file describes how to use the Infomat exampel in the file "20newsgroupsExample.zip". It has two parts. The first (I) explains how to extract and import the material into Infomat, the second (II) how to make a clustering. You have to download the program and extract it before you go on! This example is larger than the ones that comes with the program download. To learn how to start the Infomat GUI consult the file "readme.txt" in the Infomat folder. For more information read the manual in the Infomat folder. **************************************************************** *** I. Extract and import the material into Infomat. 1) Place the file "20newsgroupsExample.zip" in the "exampels/" folder in the Infomat path. 2) Unzip it. A folder "20newsgroupsExample/" is created. 3) This file ("readme.txt") is found in it. 4) The folder "20newsgroupsExample/texts/" contains a part of the 20 newsgroups text set, see http://people.csail.mit.edu/jrennie/20Newsgroups/. 5) The file "tokenFileRawStemming.xml" is a stemmed version of these texts in a format Infomat can read. (Use "File->Open Token File"). There is little else preprocessing applied. Start with this one if you want to do you own preprocessing using Infomat. 6) The file "matrix20ngFilteredWeighted.xml" is a matrix file whith preprocessing applied: stopwords are removed using the stoplist in "Infomat/files/" and the filtering function ("Algorithms->Filter Matrix"). A tf*idf-weighting is also applied. Start with this one if you want to try the clustering algorithms straight away. **************************************************************** *** II. How to make a clustering. 1) Import the material as described above. Use the file "matrix20ngFilteredWeighted.xml" to get good results easy (without having to apply preprocessing yourself). 2) Open the clustering window ("Algorithms->Clustering Algorithms"). Press "Apply" to make a K-Means clustering to five clusters. (Alter the properties if you want to change them, and apply again.) 3) To make a relative word clustering: press the topmost button so that it changes to "Columns". Choose "RelativeClusterer" in the drop-down menu. Press "Apply". For more information read the manual in the Infomat folder. /Magnus Rosell, 20100304 ----------------------