Topic Models C++
This is a C++ implementation of topic models with variational inference
It include LDA, supervised-LDA, HDP, supervised HDP, online HDP, online SHDP.
Mac users of Homebrew click here.
For Linux users:
Dowload Code here
Please cite [Bibtex]
Install:
1. This code require gcc4.8.
If you use Ubuntu 12.04 and do not have gcc version 4.8. This link maybe helpful.
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-4.8 g++-4.8
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 50
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.8 50
You may aslo want to put
export CC=gcc-4.8
export CXX=g++-4.8
in your .bashrc file :)
2. This code depend on the matrix manipulation libary buola.
Download Buola here
Buola is a very nice matrix manipulation libart which is implemented by Xavi Gratal.
Before Install Buola, there are some other dependences
a) It depend on NLopt (It actually not used in the code, however, it give the alternative to use NLop instead of GSL.)
cd {INSTALL_ROOT}
wget http://ab-initio.mit.edu/nlopt/nlopt-2.3.tar.gz
tar -xzf nlopt-2.3.tar.gz && rm nlopt-2.3.tar.gz
cd nlopt-2.3
./configure
make
sudo make install
b) It depend on Eigen 3 and Dbus
sudo apt-get install libeigen3-dev libxml2-dev libdbus-1-dev libncurses5-dev
c) It depend on boost
sudo apt-get install libboost-dev
To install Buola:
cd minibuola
mkdir build
cd build
cmake ..
make -j5
sudo make install
3. This code depend on GSL
Information about GSL, click here.
4. Woohoo! Now you can compile Topic Models C++
As the standard way,
mkdir build && cd build
cmake ..
make
Play with Topic Models
3class KTH action data for fun
This data is preprocessed with bag-of-STIP
To check the options:
./TopicModel --help
Example 1: SLDA
./TopicModel --slda --alpha 0.1 --corpus_name KTH --data YOURPATH/KTH/Train.dat --label YOURPATH/KTH/ImgLabel.txt --test YOURPATH/KTH/Test.dat --shuffle --num_classes 3 -k 30 --truth YOURPATH/KTH/GroundTruth.txt --seed 2
The result will be:
0 0 2 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 2 1 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 0 2 2 2
accuracy:0.84745762711864403016
Example 2: SHDP
./TopicModel --corpus_name KTH --data
YOURPATH/KTH/Train.dat --label YOURPATH/KTH/ImgLabel.txt --test
YOURPATH/KTH/Test.dat --shuffle --num_classes 3 -k 80 -t 20
--truth YOURPATH/KTH/GroundTruth.txt --seed 2
The result will be:
0 0 0 2 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 0 0 2 2
accuracy:0.864407
To use LDA use --lda
To use HDP use --hdp
In this case the label document is not needed anymore
For the onlineSHDP and onlineHDP, it need large data to converge. So it
does not work for the KTH data that we used here as example.
References
LDA:
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet Allocation.
Journal of Machine Learning Research, 3:993–1022, 2003.
SLDA: C. Wang, D. M. Blei, and L. Fei-Fei. Simultaneous image classification and annotation. In CVPR, 2009.
HDP: Y. W. Teh, M. I. Jordan,
M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal
of the American Statistical Association, 101(476):1566–1581, 2006.
SHDP&onlineSHDP: C.
Zhang, C.H. Ek, X. Gratal, F.T. Pokorny and H. Kjellström, Supervised
Hierarchical Dirichlet Processes with Variational Inference, In ICCV inferPGM, 2013
PS: The suplement of this paper gives the computation of the bound and update equation in detail. Recomand for beginners.
OnlineHDP: C. Wang, J. Paisley, and D. Blei. Online variational inference for the Hierarchical Dirichlet Process. In AISTATS, 2011.
Log
Nov 24th, 2013, Thanks to Ben Blackburne from the Papers team. Topic
Models C++ is packaged up for Mac users of Homebrew. Mac user click here.
Oct 16th, 2013, Instruction updated. Thanks for the feedback from Renaud Richardet and Wary Buntine.