HMM Toolkit HTK - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

HMM Toolkit HTK

Description:

The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and ... Due to this, we need to jerry-rig our data to the HTK parameterized data file format. ... – PowerPoint PPT presentation

Number of Views:922

Avg rating:3.0/5.0

Slides: 22

Provided by: publi5

Category:

more less

Transcript and Presenter's Notes

Title: HMM Toolkit HTK

1
HMM Toolkit (HTK)

Presentation by
Daniel Whiteley
AME department

2
What is HTK?

The Hidden Markov Model Toolkit (HTK) is a
portable toolkit for building and manipulating
hidden Markov models. HTK is primarily used for
speech recognition research although it has been
used for numerous other applications including
research into speech synthesis, character
recognition and DNA sequencing. HTK is in use at
hundreds of sites worldwide.

3
What is HTK?

HTK consists of a set of library modules and
tools available in C source form. The tools
provide sophisticated facilities for speech
analysis, HMM training, testing and results
analysis. The software supports HMMs using both
continuous density mixture Gaussians and discrete
distributions and can be used to build complex
HMM systems.

4
Basic HTK command format

The commands in HTK follow a basic command line
format
HCommand options files
Options are indicated by a dash followed by the
option letter. Universal options are capital
letters.
In HTK, it is not necessary to use file
extentions, but headers to determine their format.

5
Configuration files

As well, you can set up the configuration of HTK
modules using config files. They are implemented
using the -C option or they can be implemented
globally using the command setenv HCONFIG
myconfig where myconfig is your own config
modifications.
All possible configuration variables can be found
in chapter 18 of the HTK manual. However, for
most of our purposes, we only need to create a
config file with these lines
SOURCEKIND USER The user defined file format
(not sound)
TARGETKIND ANON_D Keep the file the same
format.

6
Using HTK

Parts of HMM modeling
Data Preparation
Model Training
Pattern Recognition
Model Analysis

7
Data Preparation

One small problem
HTK was tailored for speech recognition.
Therefore, most of the data preparation tools are
for audio.
Due to this, we need to jerry-rig our data to the
HTK parameterized data file format.
HTK parameter files consist of a sequence of
samples preceeded by a header. The samples are
simply data vectors, whose components are 2-byte
integers or 4-byte floating point numbers.
For us, these vectors will be a sequence of joint
angles received from a motion capture session.

8
HTK file format

The file begins with a 12-byte header containing
the following information
nSamples (4-byte int) Number of samples
samplePeriod (4-byte int) Sample period
(calculated by multiplying the number by 100ns)
sampleSize (2-byte) Number of bytes per vector
parameterKind (2-byte int) Defines the type of
data
For our purposes, either this parameter will be
0x2400, which is the user defined parameter kind,
or 0x2800, which is the discrete case.

9
HMM model creation

In order to model the motion capture squence, we
need to create a prototype of the HMM. In this
prototype, the values of B and ? are arbitrary.
The same is true for the transition matrix A,
save that any transition probability you set to
zero will remain as zero.
Models are created using a scripting language
similar to HTML.
As well, models in HTK have a beginning and
ending state which are non-emitting. These
states are not defined in the script.

10
HMM Model Example
Name of the file
Number of Gaussian distributions

h ''prototype''
ltBeginHMMgt
ltVectorSizegt 4 ltUSERgt
ltNumStatesgt 5
ltStategt 2 ltNumMixesgt 3
ltMixturegt 1 0.3
ltMeangt 4
0.0 0.0 0.0 0.0
ltVariancegt 4
1.0 1.0 1.0 1.0
ltMixturegt 2 0.4 ...
ltStategt 3 ...

Transition matrix A

...
ltTransPgt
0.0 0.4 0.3 0.3 0.0
0.0 0.2 0.5 0.3 0.0
0.0 0.2 0.2 0.4 0.2
0.0 0.1 0.2 0.3 0.4
0.0 0.0 0.0 0.0 0.0

Number of states
Mean observation vector
Sample size
Covariance matrix diagonal
All the transition probabilities for the ending
state are always zero
The distributions ID and weight
11
Vector Quantization

In order to reduce computation, we can make the
HMM discreete.
In order to use a discreete HMM, we must first
quantize the data into a set of standard vectors.
Warning in quantizing the data, error is
inheritably introduced.
Before quantizing the data, we must first have a
standard set of vectors, or a vector cookbook.
This is made with HQuant.

12
HQuant

HQuant takes the training data and uses a K-means
algorithm to evenly partition the data and find
the centriods of these partitions to create our
quantization vectors (QVs).
A sample command
HQuant -C config -n 1 64 -S train.scp vqcook
To reduce quatization time, a cookbook using a
binary tree search algorithm can be made using
the -t option.

Number of QVs for a certain data stream
You can use a script to list all of your training
files
Our cookbook will be written to this file
Use the configuration variables found in config
13
Converting to Discrete

The conversion of data files is done using the
HCopy command. In order to quantize our data, we
do this
HCopy C quantize rawdata qvdata
Where rawdata is our original data, qvdata is our
quantized data, and quantize is a config file
having these commands
SOURCEKIND USER We start with our
original data
TARGETKIND DISCRETE Convert it into
discrete data
SAVEASVQ T We throw away the continuous
data
VQTABLE vqcook We use are previously made
cookbook to quantize the data

14
Discrete HMM

Discreete HMMs are very similar to their
continuous counterparts, save for a few changes.
Discrete probabilities are in logrithmic form,
where
P(v) exp(-d(v)/2371.8)

o ltDiscretegt ltStreamInfogt 1 1
h dhmm
ltBeginHMMgt
ltNumStatesgt 5
ltStategt 2 ltNumMixesgt 10
ltDProbgt 546110
....
ltEndHMMgt

Number of discrete symbols
Duplicate function
15
Model Training (token HMM)

The initialization of our prototype can be done
using HInit
HInit options hmm data1 data2 data3 ...
HInit is used mainly for left-right HMMs. For
more ergodic HMMs, it can be initialized by doing
a flat-start. This is done by setting all means
and variances to the global counterparts using
HCompV
HCompV -m -S trainlist hmm

(The HHMM being trained)
16
Retraining

The model this then retrained using the
Welch-Baum algorithm found in HRest
HRest -w 1.0 -v 0.0001 -S trainlist hmm
The -w and -v options are to set floors for the
mixture probability and variances respectively.
The float used in -w represents a multiplier of
10-5.
This can be iterated as many times as wanted to
achieve desired results.

17
Dictionary Creation

In order to create a recognition program or
script, we must first create a dictionary.
A dictionary in HTK gives the word and its
pronunciation. For our purposes, it will just
consist of our token HMM that we trained.
RUNNING run
WALKING walk
JUMPING SKIPPING jump

Word
Tokens used to form the word
Displayed output (if not specified the word is
displayed)
18
Label Files

Label files contain a transcription of what is
going on in the data sequence.
000000 100000 walk
100001 200000 run
200001 300000 jump

End of frame in samples
Start of frame in samples
Token found in that time frame
19
Master Label Files (MLFs)
Same as a original label file

During training and recognition, we may have many
test files and their accompanying label files.
The label files can be condensed into one file
called a master label file, or MLF.

!MLF!
/a.lab
000000 100000 walk
100001 200000 run
200001 300000 jump
.
/b.lab
run
.
/jump.lab
jump
.

If the entire file is one token, it can be
labeled with just the token
The wildcard operator can be used to label
multiple files at once
20
Pattern Recognition

The recognition of a motion sequence is done by
using HVite.
To receive a transcription of the recognition
data in MLF format, we use
HVite a i results o SWT H hmmlist \
I transcripts.mlf S testfiles

Throws away unnecessary data in the label files
Output transcription file in MLF format
Text file containing a list of HMM used
Create word network from given transcriptions
MLF file that has the test files transcriptions
Motion capture data to be recognized
21
Model Analysis

The analysis of the recognition results is done
by HResults.
HResults -I transcripts.mlf -H hmmlist results
Note The reference labels and the results
labels must have different file extensions

List of HMMs used
MLF containing result labels
MLF containing the reference labels

Write a Comment

User Comments (0)