Title: HMM Toolkit HTK
1HMM Toolkit (HTK)
- Presentation by
- Daniel Whiteley
- AME department
2What is HTK?
- The Hidden Markov Model Toolkit (HTK) is a
portable toolkit for building and manipulating
hidden Markov models. HTK is primarily used for
speech recognition research although it has been
used for numerous other applications including
research into speech synthesis, character
recognition and DNA sequencing. HTK is in use at
hundreds of sites worldwide.
3What is HTK?
- HTK consists of a set of library modules and
tools available in C source form. The tools
provide sophisticated facilities for speech
analysis, HMM training, testing and results
analysis. The software supports HMMs using both
continuous density mixture Gaussians and discrete
distributions and can be used to build complex
HMM systems.
4Basic HTK command format
- The commands in HTK follow a basic command line
format - HCommand options files
- Options are indicated by a dash followed by the
option letter. Universal options are capital
letters. - In HTK, it is not necessary to use file
extentions, but headers to determine their format.
5Configuration files
- As well, you can set up the configuration of HTK
modules using config files. They are implemented
using the -C option or they can be implemented
globally using the command setenv HCONFIG
myconfig where myconfig is your own config
modifications. - All possible configuration variables can be found
in chapter 18 of the HTK manual. However, for
most of our purposes, we only need to create a
config file with these lines - SOURCEKIND USER The user defined file format
(not sound) - TARGETKIND ANON_D Keep the file the same
format.
6Using HTK
- Parts of HMM modeling
- Data Preparation
- Model Training
- Pattern Recognition
- Model Analysis
7Data Preparation
- One small problem
- HTK was tailored for speech recognition.
Therefore, most of the data preparation tools are
for audio. - Due to this, we need to jerry-rig our data to the
HTK parameterized data file format. - HTK parameter files consist of a sequence of
samples preceeded by a header. The samples are
simply data vectors, whose components are 2-byte
integers or 4-byte floating point numbers. - For us, these vectors will be a sequence of joint
angles received from a motion capture session.
8HTK file format
- The file begins with a 12-byte header containing
the following information - nSamples (4-byte int) Number of samples
- samplePeriod (4-byte int) Sample period
(calculated by multiplying the number by 100ns) - sampleSize (2-byte) Number of bytes per vector
- parameterKind (2-byte int) Defines the type of
data - For our purposes, either this parameter will be
0x2400, which is the user defined parameter kind,
or 0x2800, which is the discrete case.
9HMM model creation
- In order to model the motion capture squence, we
need to create a prototype of the HMM. In this
prototype, the values of B and ? are arbitrary.
The same is true for the transition matrix A,
save that any transition probability you set to
zero will remain as zero. - Models are created using a scripting language
similar to HTML. - As well, models in HTK have a beginning and
ending state which are non-emitting. These
states are not defined in the script.
10HMM Model Example
Name of the file
Number of Gaussian distributions
- h ''prototype''
- ltBeginHMMgt
- ltVectorSizegt 4 ltUSERgt
- ltNumStatesgt 5
- ltStategt 2 ltNumMixesgt 3
- ltMixturegt 1 0.3
- ltMeangt 4
- 0.0 0.0 0.0 0.0
- ltVariancegt 4
- 1.0 1.0 1.0 1.0
- ltMixturegt 2 0.4 ...
- ltStategt 3 ...
Transition matrix A
- ...
- ltTransPgt
- 0.0 0.4 0.3 0.3 0.0
- 0.0 0.2 0.5 0.3 0.0
- 0.0 0.2 0.2 0.4 0.2
- 0.0 0.1 0.2 0.3 0.4
- 0.0 0.0 0.0 0.0 0.0
Number of states
Mean observation vector
Sample size
Covariance matrix diagonal
All the transition probabilities for the ending
state are always zero
The distributions ID and weight
11Vector Quantization
- In order to reduce computation, we can make the
HMM discreete. - In order to use a discreete HMM, we must first
quantize the data into a set of standard vectors. - Warning in quantizing the data, error is
inheritably introduced. - Before quantizing the data, we must first have a
standard set of vectors, or a vector cookbook.
This is made with HQuant.
12HQuant
- HQuant takes the training data and uses a K-means
algorithm to evenly partition the data and find
the centriods of these partitions to create our
quantization vectors (QVs). - A sample command
-
- HQuant -C config -n 1 64 -S train.scp vqcook
- To reduce quatization time, a cookbook using a
binary tree search algorithm can be made using
the -t option.
Number of QVs for a certain data stream
You can use a script to list all of your training
files
Our cookbook will be written to this file
Use the configuration variables found in config
13Converting to Discrete
- The conversion of data files is done using the
HCopy command. In order to quantize our data, we
do this - HCopy C quantize rawdata qvdata
- Where rawdata is our original data, qvdata is our
quantized data, and quantize is a config file
having these commands - SOURCEKIND USER We start with our
original data - TARGETKIND DISCRETE Convert it into
discrete data - SAVEASVQ T We throw away the continuous
data - VQTABLE vqcook We use are previously made
cookbook to quantize the data
14Discrete HMM
- Discreete HMMs are very similar to their
continuous counterparts, save for a few changes. - Discrete probabilities are in logrithmic form,
where - P(v) exp(-d(v)/2371.8)
- o ltDiscretegt ltStreamInfogt 1 1
- h dhmm
- ltBeginHMMgt
- ltNumStatesgt 5
- ltStategt 2 ltNumMixesgt 10
- ltDProbgt 546110
- ....
- ltEndHMMgt
Number of discrete symbols
Duplicate function
15Model Training (token HMM)
- The initialization of our prototype can be done
using HInit - HInit options hmm data1 data2 data3 ...
- HInit is used mainly for left-right HMMs. For
more ergodic HMMs, it can be initialized by doing
a flat-start. This is done by setting all means
and variances to the global counterparts using
HCompV - HCompV -m -S trainlist hmm
(The HHMM being trained)
16Retraining
- The model this then retrained using the
Welch-Baum algorithm found in HRest - HRest -w 1.0 -v 0.0001 -S trainlist hmm
- The -w and -v options are to set floors for the
mixture probability and variances respectively.
The float used in -w represents a multiplier of
10-5. - This can be iterated as many times as wanted to
achieve desired results.
17Dictionary Creation
- In order to create a recognition program or
script, we must first create a dictionary. - A dictionary in HTK gives the word and its
pronunciation. For our purposes, it will just
consist of our token HMM that we trained. - RUNNING run
- WALKING walk
- JUMPING SKIPPING jump
Word
Tokens used to form the word
Displayed output (if not specified the word is
displayed)
18Label Files
- Label files contain a transcription of what is
going on in the data sequence. - 000000 100000 walk
- 100001 200000 run
- 200001 300000 jump
End of frame in samples
Start of frame in samples
Token found in that time frame
19Master Label Files (MLFs)
Same as a original label file
- During training and recognition, we may have many
test files and their accompanying label files.
The label files can be condensed into one file
called a master label file, or MLF.
- !MLF!
- /a.lab
- 000000 100000 walk
- 100001 200000 run
- 200001 300000 jump
- .
- /b.lab
- run
- .
- /jump.lab
- jump
- .
If the entire file is one token, it can be
labeled with just the token
The wildcard operator can be used to label
multiple files at once
20Pattern Recognition
- The recognition of a motion sequence is done by
using HVite. - To receive a transcription of the recognition
data in MLF format, we use -
-
- HVite a i results o SWT H hmmlist \
- I transcripts.mlf S testfiles
Throws away unnecessary data in the label files
Output transcription file in MLF format
Text file containing a list of HMM used
Create word network from given transcriptions
MLF file that has the test files transcriptions
Motion capture data to be recognized
21Model Analysis
- The analysis of the recognition results is done
by HResults. - HResults -I transcripts.mlf -H hmmlist results
- Note The reference labels and the results
labels must have different file extensions
List of HMMs used
MLF containing result labels
MLF containing the reference labels