-------------- beginning of "readme.txt" -------------- *** Infomat 100305 *** Copyright (C) 2010 Magnus Rosell rosell@csc.kth.se ------------------------ Content: 1 Introduction 2 Versions 3 Documentation 4 Short "How to use" 4.1 Paths 4.2 Compile and run 4.3 Example 4.3.1 Swedish 4.3.2 English 5 Contents 6 Known issues and planned features 7 Thanks 8 Dedication 9 Disclaimer and terms of use ------------------------ 1 Introduction This is a short introducing text on the Infomat system. A vectorspace processing, visualizing and exploration tool contructed by Magnus Rosell (http://www.csc.kth.se/~rosell/) at KTH CSC (http://www.csc.kth.se/), the School of Computer Science and Communication, KTH (The Royal Institute of Technology), Stockholm, Sweden. 2006 - ... Comments etc to: rosell@csc.kth.se I will try to answer, but don't expect it. ------------------------ 2 Versions As the work with Infomat progress there will be newer versions available for download. The first version "Infomat 1.0" was released 070529. All the following versions will have the date as their version number, "Infomat YYMMDD". I give no guarantee that later versions work with the previous, and I won't keep a detailed version history. Short reversed version history: * Infomat 100305 Several improvements. Extended search tool. Context menu in the main view makes oopening clusters more convenient. * Infomat 090727 Several improvements, especially added a search tool. * Infomat 090316 Several improvements of the GUI. Especially the list object is much faster as not all objects are displayed. * Infomat 081001 There is now a few processing tools in the infomat package. They allow for clustering from the command prompt and are described in the manual. Several improvements of the GUI. * Infomat 080829 A very much improved version. Several improvements of the GUI, which is now run through class "InfomatGUI". Infomat is now besides the GUI also a text processing package. There is one example class included. * Infomat 1.0 provided the graphical user interface. This was the original idea reason for the devlopment. However, the implementation proved to be very flexible for other kinds of use, which are apparent in later versions. ------------------------ 3 Documentation Javadoc in .../Infomat/doc/ The javadoc is unfortunaltely not complete. More work has been put into the writing documentation of the more central aspects of the code. The javadoc is constructed through: .../Infomat/>javadoc -d doc/ -private -sourcepath src/ -subpackages infomat:mro mro A manual (work in progress) in the Infomat directory: infomatmanYYMMDD.pdf The Infomat website: http://www.csc.kth.se/tcs/projects/infomat/infomat/ ------------------------ 4 Short "How to use" To do use follow these simple steps. By analysing the classes used you will probably be able to continue using the many features of the package. Infomat is implemented in java SE6 on Unix. I have tried it on Windows also and it seems to be working. 4.1 Paths You may want to set the paths in the file .../Infomat/settings/InfomatProperties.xml It is also possible to change these from the "File" menu option "Infomat Properties" in the GUI. The "Icon Path" is the most important. Otherwise the icons in the toolbar won't be visual. The "Result Path" sets a start directory for loading and saving results, which is convenient. 4.2 Compile and run. * To run the visualization tool: .../Infomat>java -cp classes/ -Xms1024m -Xmx1024m infomat.InfomatGUI * To run the simplest processing exampel ExampleClusterer: .../Infomat>java -cp classes/ infomat.ExampleClusterer The results are xml-files that can be found in the result/ directory. * To compile: ../Infomat>javac -classpath ./classes/ -sourcepath ./src ./src/infomat/*.java -d classes ../Infomat>javac -classpath ./classes/ -sourcepath ./src ./src/mro/util/experimentation/*.java -d classes * To run the other classes: look in the javadoc. I try to save the commands for running and compiling in them. 4.3 Example In the directory .../Infomat/example/ you find a few files to start with. There are two sets: one English and one Swedish. There is also a larger example available on the website. 4.3.1 Swedish * The "swe_files" directory contains a few Swedish newspaper articles from the KTHNC - the KTH News Corpus. They are divided into sections. * The "swe_tokenFile.xml" is a preprocessed (lemmatization and decompounding) file containing the files in "swe_files". Load it from the "Open Token File" option in the File menu. * "swe_matrix.xml" is the same set of text processed a bit more. Load it from the File menus option "Open IMatrix". * "swe_text_grouping.xml" is a categorization of the articles following the newspaper sections. Load it from the Grouping Edit Window. You could make a clustering and compare it to this categorization: make the clustering the shown grouping and the categorization the color grouping along the same dimension. 4.3.2 English * The "eng_files" directory contains a few English posts from the 20 newsgroups. They are taken from five groups. * The "eng_tokenFile.xml" is a preprocessed (stemming) file containing the files in "eng_files". Load it from the "Open Token File" option in the File menu. * "eng_matrix.xml" is the same set of text processed a bit more. Load it from the File menus option "Open IMatrix". * "eng_text_grouping.xml" is a categorization of the articles following the newsgroups. Load it from the Grouping Edit Window. You could make a clustering and compare it to this categorization: make the clustering the shown grouping and the categorization the color grouping along the same dimension. ------------------------ 5 Contents The zip file you downloaded contains a lot of code. Included with the infomat package is also the mro package which contains various tools. Some of these are used in the infomat package. All of the content of the zip file is distributed under the same license, see section 8. ------------------------ 6 Known issues and planned features The following is regarding the GUI. Please do not expect theses to be implemented in the near future. Known issues: * If a group is so small that it gets represented by less than a pixel the program does not display it. The group is still presented in the grouping information, so care has to be taken when comparing that with the picture. Any deletion or zooming of the area where the group should have been visual will affect it. * The Guide does not handle partly hidden windows very well. * There also seems to be some kind of problem with the Guide in Windows. * There are probably several other issues. Unofficial unordered incomplete list of planned features: * A search tool. * Working with several matrixes. * Moving objects from one group to another within a grouping * Merge groups in groupings. (Can be done by removing everything else.) * Loading and working progress information * Choosing compressed format for loading and saving of matrixes and groupings. ------------------------ 7 Thanks Several people have been helpful in different ways in the construction of this system. * Joel Jonsson, Fredrika Wahlström, Urban Koistinen, Janna Iljina, who constructed a first prototype of the GUI. * Martin Hassel, whose code I some times use as an inspiration. * My co-worker Sumithra Velupillai. * My PhD-supervisor Viggo Kann. ------------------------ 8 Dedication Infomat is dedicated to my father Örjan Rosell, who when I was nine sat down with me at our newly bought computer, a VIC 20, and taught me the basics of programming. ------------------------ 9 Disclaimer and terms of use This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. --- If you use this for research please include a reference to the author in your papers and reports. I do not guarantee that the code follow any "good" programming code or java standards of any sort (although I do my best). There are probably several bugs. The author will not provide any support. Comments and bug reports may be sent to the email-adress in the "Introduction". I will try to answer and fix issues but don't expect it. If you've made changes or extensions you would like to share with me I will be glad. To make it easier for me you could do like this: make javadoc entries for the changes/extensions and zip the code and email it to me. Also attach a discription of the changes/extensions and which classes your changes/extension have effected. If you can, please also provide a simple useage example. /Magnus Rosell -------------- end of "readme.txt" --------------