Topic Models C++

This is a C++ implementation of topic models with variational inference
It include LDA, supervised-LDA, HDP, supervised HDP, online HDP, online SHDP.

Mac users of Homebrew click here.

For Linux users:
Dowload Code here

Please cite [Bibtex]


Install:

1. This code require gcc4.8.
    If you use Ubuntu 12.04 and do not have gcc version 4.8. This link maybe helpful.
    sudo add-apt-repository ppa:ubuntu-toolchain-r/test
    sudo apt-get update
    sudo apt-get install gcc-4.8 g++-4.8
    sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 50
    sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.8 50


    You may aslo want to put
          export CC=gcc-4.8
          export CXX=g++-4.8
    in your .bashrc file :)
2. This code depend on the matrix manipulation libary buola.
    Download Buola here
   Buola is a very nice matrix manipulation libart which is implemented by Xavi Gratal.
    Before Install Buola, there are some other dependences

    a) It depend on NLopt      (It actually not used in the code, however, it give the alternative to use NLop instead of GSL.)
    cd {INSTALL_ROOT}
    wget http://ab-initio.mit.edu/nlopt/nlopt-2.3.tar.gz
    tar -xzf nlopt-2.3.tar.gz && rm nlopt-2.3.tar.gz
    cd nlopt-2.3
    ./configure
    make
    sudo make install

    b) It depend on Eigen 3 and Dbus
    sudo apt-get install libeigen3-dev libxml2-dev libdbus-1-dev libncurses5-dev

    c) It depend on boost
    sudo apt-get install libboost-dev

   To install Buola:
            cd minibuola
            mkdir build
            cd build
            cmake ..
            make -j5
            sudo make install

3. This code depend on GSL
   
Information about GSL, click here.

4. Woohoo! Now you can compile Topic Models C++
   As the standard way,
        mkdir build && cd build
        cmake ..
        make


Play with Topic Models

3class KTH action data for fun
This data is preprocessed with bag-of-STIP

To check the options:
./TopicModel --help


Example 1: SLDA
./TopicModel --slda --alpha 0.1 --corpus_name KTH --data
YOURPATH/KTH/Train.dat --label YOURPATH/KTH/ImgLabel.txt --test YOURPATH/KTH/Test.dat --shuffle --num_classes 3   -k 30  --truth YOURPATH/KTH/GroundTruth.txt --seed 2
The result will be:
0 0 2 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 2 1 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 0 2 2 2
accuracy:0.84745762711864403016


Example 2: SHDP
./TopicModel  --corpus_name KTH --data YOURPATH/KTH/Train.dat --label YOURPATH/KTH/ImgLabel.txt --test YOURPATH/KTH/Test.dat --shuffle --num_classes 3  -k 80 -t 20 --truth
YOURPATH/KTH/GroundTruth.txt --seed 2

The result will be:
0 0 0 2 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 0 0 2 2
accuracy:0.864407

To use LDA  use --lda
To use HDP use --hdp
In this case the label document is not needed anymore

For the onlineSHDP and onlineHDP, it need large data to converge. So it does not work for the KTH data that we used here as example.


References
LDA:  D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:993–1022, 2003.

SLDA: C. Wang, D. M. Blei, and L. Fei-Fei. Simultaneous image classification and annotation. In CVPR, 2009.

HDP: Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581, 2006.

SHDP&onlineSHDP:  C. Zhang, C.H. Ek, X. Gratal, F.T. Pokorny and H. Kjellstr÷m, Supervised Hierarchical Dirichlet Processes with Variational Inference, In ICCV inferPGM, 2013
PS: The suplement of this paper gives the computation of the bound and update equation in detail. Recomand for beginners.

OnlineHDP: C. Wang, J. Paisley, and D. Blei. Online variational inference for the Hierarchical Dirichlet Process. In AISTATS, 2011.



Log
Nov 24th, 2013, Thanks to Ben Blackburne from the Papers team. Topic Models C++ is packaged up for Mac users of Homebrew. Mac user click here.
Oct 16th, 2013,  Instruction updated. Thanks for the feedback from Renaud Richardet and Wary Buntine.