Project Name PostMortem - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

Project Name PostMortem

Description:

Xt. P(st=S2) Expert M. Expert 2. Expert 1. P(st=SM) P(st-1=S1) P(st-1 ... Fuzzy Membership Degree From. Fuzzy C-Means clustering. M Regression Models as Experts ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 37

Provided by: Informati90

Category:

more less

Transcript and Presenter's Notes

Title: Project Name PostMortem

1
Time-line Hidden Markov Experts for Time Series
Prediction
Xin Wang xinw_at_infoscience.otago.ac.nz PhD
Candidate Department of Information
Science University of Otago Dunedin New Zealand
2
Outline of Talk

Background on Chaotic Time Series Prediction
Mixture of Experts (ME) Models for Prediction
Time-line Hidden Markov Experts (THME) for
Prediction
Experiments on One-step-ahead and
Multi-step-ahead Prediction.

3
Chaotic Time Series

Chaotic Time series is a
chronological sequence of observations from a
non-linear (deterministic) dynamical system.
A simple time series
State space, a m-dimensional space
of
Velocity of trajectory
derivative of state vector
to time t

0.08, 0.14, 0.19, 0.22, 0.23, 0.23,0.22, 0.20,
......
trajectory
4
Prediction of Chaotic Time Series

For chaotic time series , by Takens
embedding theorem,
there exists a mapping from the state
vector to the future value
of the time series
The task for the prediction
Reconstruction of the state space
Learn the mapping
i.e. approximation of with training
samples
Generate future values.

5
Techniques for Prediction (1)

Global Model
One regression model covering the entire range
of the underlying trajectory, such as
Polynomial, or
Neural Networks MLP, RBF, etc. or
Other regression model, e.g. Support Vector
Machine (SVM)
These models learn with observed samples
(training set) and make prediction (for test set
) afterwards.

6
Techniques for Prediction (2)

Local Model
1. Models based on Nearest Neighbours
Local averaging
Local regression
Locally weighted averaging
Locally Weighted regression.
Neighbours of the query are identified from the
observed samples, then the prediction for the
query is achieved by averaging the target outputs
of the neighbours, or estimation from the (linear
or non-linear) regression function built over the
neighbours.

7
Techniques for Prediction (3)

2. Models based on Divide Conquer principle
Piece-wise regression
Threshold Autoregressive Model (TAR)
Switching Regression.
Mixture of Experts (ME) (in connectionist
society)
Divide the state space into some sub-spaces
Learn the mapping from the divided trajectory in
each sub-space to the target output with a
regression model (expert)
Combine (Linear average) the outputs from the
experts as the output of the model.

8
Three ME models (1)

Gated Experts (GE)
State space is divided into a set of sub-spaces
A connectionist model (MLP) learns the mapping on
the phases of divided trajectory in each
sub-space
The experts are combined by probabilities of the
point on
trajectory being in each sub-space.
Problem
Combination relies only on input position.
Hidden Markov Experts (HME)
Similar to ME, but
The experts are combined by a HMM.
The probabilities for the combination relies on
the previous state and the state transition
probabilities of the HMM.

9
Three ME models (2)

Problem
Transition probabilities are constant without
concern about the influence from outside.
Unable to indicate state transition at distinct
time point precisely.
Input/Output HMM (IOHMM)
Local experts are combined by a inhomogeneous
HMM, where the transition probabilities are
time-varying.
More information for expert combination.
Problem
The experts must be linear perceptron or MLPs.

10
"Time-line" Hidden Markov Experts ------THME

The trajectory is divided into phases belonging
to some categories according to the velocity.
A regression model is applied to learn the
mapping from the phases in each category to the
target outputs.
HMM is applied for expert combination.
Each category defines a state of the trajectory
and associates with a state of the HMM.
The transition probabilities of the HMM are
designed as time-varying, the HMM thus is called
time-line HMM and the model is called THME.
The time-varying state transition probabilities
are conditional on the "velocity" of the
trajectory and modelled by a connectionist model.

11
Architecture of THME

THME with M local experts moderated by a HMM.
is embedded series values as
input.
The experts are some regression models that
trained with the samples in the categories.
is the output of expert i .
The experts are combined by the probabilities
of the underlying process being in each state of
the HMM.

Time-line HMM
. . .

. . .
Expert 1
Expert 2
Expert M
12
Dividing the Trajectory Local Learning

The dividing of the trajectory in the state space
is implemented on the information contained in
the state vector and the corresponding output.
To enable the velocity-based dividing, the
feature used for the dividing is
Fuzzy C-means clustering algorithm applied for
dividing the trajectory.
The samples on the phases in each cluster
(category) then are
used to training a regression model to make
it a local expert
over the phases (state).
The Local experts could be MLP, RBF, SVM.

13
HMM for Expert Combination (1) ------An
Example of a Three-state HMM
a22
S2
A process with observations,
generated from a hidden state series,
belonging to three kinds of states
a23
a12
a21
a32
a13
S3
S1
a33
a11
a31
f( S1) Emission probability distribution
P(yt1 st1S1)
P(yt stS1)
P(yt-1 st-1S1)
P(yt stS2)
P(yt1 st1S2)
P(yt-1 st-1S2)
P(yt1 st1S3)
P(yt stS3)
P(yt-1 st-1S3)
f( S1)
f( S1)
f (S1)
Pt(S1)
Pt-1(S1)
Pt1(S1)
f( S2)
f( S2)
f( S2)
Pt(S2)
Pt-1(S2)
Pt1(S2)
f( S3)
f( S3)
f( S3)
Pt(S3)
Pt-1(S3)
Pt1(S3)
t1
t
t-1
14
HMM for Expert Combination (2)

Suppose a HMM with Gaussian emission
distribution
Apply the experts on all the training samples.
The expert j trained with the samples on some
phases of trajectory has less error over the
phases than over others of the trajectory.
The samples on the phases have higher emission
probabilities, so the phases are associated with
state j in the HMM.
The evolution of the underlying system from one
phase to another on the trajectory is the state
transition in the HMM.

15
Time-line HMM for Expert Combination

In traditional HMM, constant transition
probabilities are hold for all time points,
expert combination is not always good.
Time-line HMM could be applied, the transition
probabilities are time-varying, i.e. for
different time point, there are different
transition probabilities.
The learning of the time-line HMM is searching
the best transition probabilities on every time
point to observe the samples with maximum
probability.
A modified Baum-Welch algorithm in EM
(Expectation and Maximisation) process is
developed for time-line HMM learning.

16
Diagram of Expert Combination
yt

P(s tS i) the probability of being in
state S i at time t.
? differentiating operation,
.
? multiplication between two real values.
? multiplication between two matrixes.
"State Transition Network generates transition
probabilities for the time-line HMM.

.
.

.
.
. . .

. . .
P(stS2)
P(stS1)
P(stSM)
P(st-1S1)
P(stS1)
P(st-1S2)
P(stS2)
. . .
. . .
P(st-1SM)
P(stSM)
aij(t)
State Tran. Net.
. . .
Expert 2
Expert M-1
Expert 1
Expert M
?Xt
?
Xt
17
"Time-line" HMM learning ------Modified
Baum-Welch Algorithm

For the time-line HMM,
Gaussian emission distribution is assumed
State transition probability
log Likelihood Function (auxiliary function)
about the current parameter ? and to be
estimated parameter ?

18
"Time-line" HMM learning ------EM Step in the
Algorithm

Expectation (E-step)
Forward and Backward to estimate Q function.
Maximisation (M-step)
Maximising the Q function to update the
parameter.
Initial state probability
Time-varying transition probability
Variance of Gaussian

Expectation Step Forward/backward steps to
estimate Q function.
Maximisation Step Maximising the Q function to
reach a critical point of ?.
EM step for time-line HMM Learning
19
State Transition Probability Modelling

The state transition property is a series of
matrix entries corresponding to the time points.
The state of the time series is defined by
, the
velocity of state vector, , could be used
to learn the state transition probabilities.
A RBF structured state transition network
performs the modelling.
In training, the network learns the mapping from

to the transition probabilities at the
time.
In prediction, the vector about the
previous values of the time series, estimates
transition probabilities by the network.

20
Review of THME training
Outcome
Technique Applied
Training Step

Fuzzy Membership Degree From
Fuzzy C-Means clustering.

Fuzzy C-Means Clustering

Trajectory Dividing

M Regression Models as Experts

Non-linear Regression by MLP, or RBF, or SVM.

Expert Training

HMM Gauss Distribution Parameters
Transition Probabilities for Time Points

Modified Baum-Welch Algorithm in EM Steps.

HMM Learning

State Transition Network

Transition Probability Modelling

Neural Network Modelling

21
Prediction

Prior probability
Combine experts
Single-step-ahead prediction posterior
probability (by Bayes law) for state
re-estimation.
Multi-step-ahead prediction Feed back output as
input for next step prediction and repeat the
steps.

22
Experiments ------ Data Sets

One-step-ahead prediction (compare with global
models HME)
Laser Data Leuven Data
1000 points for Training, next 500 for Test.
5-fold Cross Validation.
Multi-step-ahead prediction (compare with
Benchmark results)
Laser Data first 1000 points for Training,
next 100 for test.
Leuven Data first 2000 points for Training,
next 200 for test.
Mackey-Glass Data (17) 1000 points for
Training next 500
for Test, predict from
.
1. Direct Prediction,
2. Iterated One-step-ahead Prediction,
85 iterations

23
Prediction Result ------ One-step-ahead
Prediction with THME-MLP, THME-RBF, THME-SVM, in
NMSE (Normalized Mean Squared Error). Number of
Expert2.
24
THME-SVM for Laser Time Series ------One-step-ah
ead Prediction
25
Prediction Error from, 1 THME-RBF, 2HMM-RBF
for Laser Data
1
2
26
Expert Combination

HMM transition probabilities
THME transition probabilities for the two experts
on the points 5053.

27
Prior probabilities on Leuven data
------One-step-ahead Prediction with THME-RBF
28
Prediction of Laser Leuven Data
------Multi-step-ahead Prediction

Prediction of Laser data with THME-RBF, THME-SVM.
Prediction of Leuven data with THME-RBF, THME-SVM.

29
Prediction of Mackey-Glass Data
------Multi-step-ahead Prediction

Prediction of Mackey-Glass data with THME-RBF,
THME-SVM in Direct Iterated mode.

30
Prediction of Laser Data
------Multi-step-ahead with THME-RBF
31
Prediction of Leuven Data
------Multi-step-ahead
32
Prediction Error for Mackey-Glass data
------Multi-step-ahead with THME-SVM
33
Review ------ Features of THME

Dynamics introduced by HMM.
Time-varying transition probabilities detect
state transition.
Combine the experts by
relying on both exterior information
and interior state status .
Similar to IOHMM, But
Experts could be MLP, RBF, or SVM.
The variance of Gaussian emission distribution
is adjustable to fit the noise level for each
state instead of pre-setting in IOHMM.
This makes the state estimation more precise and
gives high quality distribution evaluation for a
series value

34
Summary

Velocity-based trajectory-dividing in ME is
applied on chaotic time series prediction.
The "time-line" HMM introduces dynamics for the
expert combination and more information is
utilised.
The modified Baum-Welch algorithm in EM steps has
been developed for the learning of the
time-line HMM.
The time-line" hidden Markov expert model has
better performance on some time series in
one-step-ahead and multi-step-ahead prediction.
A connectionist network is used to model the
time-varying state transition probabilities along
a time series.

35
Discussion

Feature scheme for dividing the trajectory
may have other choice.
How to choose the number of local experts.
How to choose the parameters of the RBF
transition probability network.

36
Reference

L. E. Baum, T. Petrie, G. Soules and N. Weiss, A
Maximization Technique Occurring in the
Statistical Analysis of Probabilistic Functions
of Markov Chains, Annals of Mathematical
Statistics, Vol. 41, pp. 164-171, 1970.
Y. Bengio and P. Frasconi, An Input Output HMM
Architecture, in G. Tesauro, D. S. Touretzky and
T. K. Leen Eds, Advances in Neural Information
Processing Systems, Vol. 7, MIT Press, Cambridge,
MA, 1995, pp. 427-434.
J. Bezdek and S. Pal, Fuzzy Models for Pattern
Recognition, IEEE Press, 1992.
A. Dempster, N. Laird and D. Rubin, Maximum
Likelihood from Incomplete Data via the EM
Algorithm, Journal of the Royal Statistical
Society, Series B, No. 39, pp. 1-38, 1977.
J. D. Farmer and J. J. Sidorowich, Predicting
Chaotic Time Series, Physical Review Letters,
Vol. 59, No. 8, pp. 845-848, 1987.
R. A. Jacobs, M. I. Jordan, S. J. Nowlan and G.
E. Hinton, Adaptive Mixtures of Local Experts,
Neural Computation, Vol. 3, pp. 79-87, 1991.
F. Takens, Detecting Strange Attractors in
Turbulence, Proceedings of Symposium on
Dynamical Systems and Turbulence, Lecture Notes
in Mathematics, 1980, pp. 366-381.
A. S. Weigend, M. Mangeas and A. N. Srivastava,
Nonlinear Gated Experts for Time Series
Discovering Regimes and Avoiding Overfitting,
International Journal of Neural Systems, Vol. 6,
No. 4, pp. 373-399, 1995.
A. S. Weigend and S. Shi, Predicting Daily
Probability Distributions of SP500 Returns,
Journal of Forecasting, Vol. 19, pp. 375-392,
2000.