LISSH project: Speech Recognition using Hidden Markov Models

BACK

Title of my bachelor's thesis was "Implementation of adaptive continuous Hidden Markov Models integrated with language grammars towards a practical text-free speaker independent Persian speech recognition system".

The project was done in the Laboratory of Intelligent Speech Synthesis (LISS) in Amirkabir University of Technology under supervision of Prof. M. M. Homayounpour. Besides, Hidden Markov Models (HMM) were extensively used throughout this project. That's why, the project was called LISSH.

The goal of the project was to recognize continuous speech from telephony data in both speaker and text independent manner in realtime. Thus, domain of the speakers was extremely diverse (regarding age, sex, accent, etc) and it was used for interactive conversations where no strict grammatical constraints were imposed. In fact, we started with an input sound signal and performed three sequential steps to fulfill the recognition:
  1. Extract feature vectors from the sound frames.
  2. Extract the most likely phoneme sequence out of the sound features.
  3. Compile grammatically sound sentences by grouping the phonemes into isolated words and performing as less costly as possible adjustments on the sequence of phonemes extracted in step 2.
The first step is completely carried out by HTK feature extraction toolbox with some modifications in input/output format.
In the second step, we use Hidden Markov Model (HMM) to extract sequence of the uttered phonemes. Therefore, I implemented a general purpose toolbox for learning and manipulating multiple mixture, continuous observation HMM.
Up to this point (step 2) no assumption is made on structure of the spoken sentence. However, in step 3 we assume that the sentences matches with a context free grammar. Then we defined some restricted rules to specify structure of valid sentences.


Downloads:


Here you can find some software packages (for Windows users) and some documents relevant to this work.
  • Sound Feature Extraction tools (Download archive file 300KB)
    Contains: Binary files to extract features in a format compatible with the other parts of this project.
    *Note: This package should only be used to generate some initial feature files. After making sure that you know how the feature files should look like, you are supposed to use your own feature extractor. Please do not use the "Feature Extractor" package for any other purpose for the sake of loyalty to HTK toolbox providers.

  • General purpose HMM tools (Download archive file 1.5MB)
    Contains: Executable files, sample scripts, test files and user manual for some essential HMM tools. The following tools are included:
    • Forward test (problem #1)
    • Viterbi test (problem #2)
    • Baum-Welch training procedure (problem #3)
    • Viterbi training procedure (problem #3)
    • Model initialization
    • Similarity estimate between two models

  • Speech Recognition tools (Download archive file ??MB)
    Contains: Phoneme search and audio-frame labeling tools, language models and sentence matching functionalities.
    • Viterbi Beam Search with bigram/trigram weights (for audio sequence labeling and phoneme extraction)
    • Online HMM adaptation tool (in order to improve recognition performance for a known speaker)
    • Sentence compiler tool (gets the extracted phoneme sequence, a dictionary, and the language grammar and finds the regular sentence that best matches the input stream)

  • Full text of the project report (in Persian) (PDF 1.5MB)

  • Presentation slides (in Persian) (PPT 1.1MB)