Project Name PostMortem - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Project Name PostMortem

Description:

Xt. P(st=S2) Expert M. Expert 2. Expert 1. P(st=SM) P(st-1=S1) P(st-1 ... Fuzzy Membership Degree From. Fuzzy C-Means clustering. M Regression Models as Experts ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 37
Provided by: Informati90
Category:

less

Transcript and Presenter's Notes

Title: Project Name PostMortem


1
Time-line Hidden Markov Experts for Time Series
Prediction
Xin Wang xinw_at_infoscience.otago.ac.nz PhD
Candidate Department of Information
Science University of Otago Dunedin New Zealand
2
Outline of Talk
  • Background on Chaotic Time Series Prediction
  • Mixture of Experts (ME) Models for Prediction
  • Time-line Hidden Markov Experts (THME) for
    Prediction
  • Experiments on One-step-ahead and
    Multi-step-ahead Prediction.

3
Chaotic Time Series
  • Chaotic Time series is a
    chronological sequence of observations from a
    non-linear (deterministic) dynamical system.
  • A simple time series
  • State space, a m-dimensional space
  • of
  • Velocity of trajectory
  • derivative of state vector
  • to time t

0.08, 0.14, 0.19, 0.22, 0.23, 0.23,0.22, 0.20,
......
trajectory
4
Prediction of Chaotic Time Series
  • For chaotic time series , by Takens
    embedding theorem,
  • there exists a mapping from the state
    vector to the future value
  • of the time series
  • The task for the prediction
  • Reconstruction of the state space
  • Learn the mapping
  • i.e. approximation of with training
    samples
  • Generate future values.

5
Techniques for Prediction (1)
  • Global Model
  • One regression model covering the entire range
    of the underlying trajectory, such as
  • Polynomial, or
  • Neural Networks MLP, RBF, etc. or
  • Other regression model, e.g. Support Vector
    Machine (SVM)
  • These models learn with observed samples
    (training set) and make prediction (for test set
    ) afterwards.

6
Techniques for Prediction (2)
  • Local Model
  • 1. Models based on Nearest Neighbours
  • Local averaging
  • Local regression
  • Locally weighted averaging
  • Locally Weighted regression.
  • Neighbours of the query are identified from the
    observed samples, then the prediction for the
    query is achieved by averaging the target outputs
    of the neighbours, or estimation from the (linear
    or non-linear) regression function built over the
    neighbours.

7
Techniques for Prediction (3)
  • 2. Models based on Divide Conquer principle
  • Piece-wise regression
  • Threshold Autoregressive Model (TAR)
  • Switching Regression.
  • Mixture of Experts (ME) (in connectionist
    society)
  • Divide the state space into some sub-spaces
  • Learn the mapping from the divided trajectory in
    each sub-space to the target output with a
    regression model (expert)
  • Combine (Linear average) the outputs from the
    experts as the output of the model.

8
Three ME models (1)
  • Gated Experts (GE)
  • State space is divided into a set of sub-spaces
  • A connectionist model (MLP) learns the mapping on
    the phases of divided trajectory in each
    sub-space
  • The experts are combined by probabilities of the
    point on
  • trajectory being in each sub-space.
  • Problem
  • Combination relies only on input position.
  • Hidden Markov Experts (HME)
  • Similar to ME, but
  • The experts are combined by a HMM.
  • The probabilities for the combination relies on
    the previous state and the state transition
    probabilities of the HMM.

9
Three ME models (2)
  • Problem
  • Transition probabilities are constant without
    concern about the influence from outside.
  • Unable to indicate state transition at distinct
    time point precisely.
  • Input/Output HMM (IOHMM)
  • Local experts are combined by a inhomogeneous
    HMM, where the transition probabilities are
    time-varying.
  • More information for expert combination.
  • Problem
  • The experts must be linear perceptron or MLPs.

10
"Time-line" Hidden Markov Experts ------THME
  • The trajectory is divided into phases belonging
    to some categories according to the velocity.
  • A regression model is applied to learn the
    mapping from the phases in each category to the
    target outputs.
  • HMM is applied for expert combination.
  • Each category defines a state of the trajectory
    and associates with a state of the HMM.
  • The transition probabilities of the HMM are
    designed as time-varying, the HMM thus is called
    time-line HMM and the model is called THME.
  • The time-varying state transition probabilities
    are conditional on the "velocity" of the
    trajectory and modelled by a connectionist model.

11
Architecture of THME
  • THME with M local experts moderated by a HMM.
  • is embedded series values as
    input.
  • The experts are some regression models that
    trained with the samples in the categories.
  • is the output of expert i .
  • The experts are combined by the probabilities
    of the underlying process being in each state of
    the HMM.


Time-line HMM
. . .


. . .
Expert 1
Expert 2
Expert M
12
Dividing the Trajectory Local Learning
  • The dividing of the trajectory in the state space
    is implemented on the information contained in
    the state vector and the corresponding output.
  • To enable the velocity-based dividing, the
    feature used for the dividing is
  • Fuzzy C-means clustering algorithm applied for
    dividing the trajectory.
  • The samples on the phases in each cluster
    (category) then are
  • used to training a regression model to make
    it a local expert
  • over the phases (state).
  • The Local experts could be MLP, RBF, SVM.

13
HMM for Expert Combination (1) ------An
Example of a Three-state HMM
a22
S2
A process with observations,
generated from a hidden state series,
belonging to three kinds of states
a23
a12
a21
a32
a13
S3
S1
a33
a11
a31
f( S1) Emission probability distribution
P(yt1 st1S1)
P(yt stS1)
P(yt-1 st-1S1)
P(yt stS2)
P(yt1 st1S2)
P(yt-1 st-1S2)
P(yt1 st1S3)
P(yt stS3)
P(yt-1 st-1S3)
f( S1)
f( S1)
f (S1)
Pt(S1)
Pt-1(S1)
Pt1(S1)
f( S2)
f( S2)
f( S2)
Pt(S2)
Pt-1(S2)
Pt1(S2)
f( S3)
f( S3)
f( S3)
Pt(S3)
Pt-1(S3)
Pt1(S3)
t1
t
t-1
14
HMM for Expert Combination (2)
  • Suppose a HMM with Gaussian emission
    distribution
  • Apply the experts on all the training samples.
  • The expert j trained with the samples on some
    phases of trajectory has less error over the
    phases than over others of the trajectory.
  • The samples on the phases have higher emission
    probabilities, so the phases are associated with
    state j in the HMM.
  • The evolution of the underlying system from one
    phase to another on the trajectory is the state
    transition in the HMM.

15
Time-line HMM for Expert Combination
  • In traditional HMM, constant transition
    probabilities are hold for all time points,
    expert combination is not always good.
  • Time-line HMM could be applied, the transition
    probabilities are time-varying, i.e. for
    different time point, there are different
    transition probabilities.
  • The learning of the time-line HMM is searching
    the best transition probabilities on every time
    point to observe the samples with maximum
    probability.
  • A modified Baum-Welch algorithm in EM
    (Expectation and Maximisation) process is
    developed for time-line HMM learning.

16
Diagram of Expert Combination
yt
  • P(s tS i) the probability of being in
  • state S i at time t.
  • ? differentiating operation,
  • .
  • ? multiplication between two real values.
  • ? multiplication between two matrixes.
  • "State Transition Network generates transition
    probabilities for the time-line HMM.

.
.

.
.
. . .


. . .
P(stS2)
P(stS1)
P(stSM)
P(st-1S1)
P(stS1)
P(st-1S2)
P(stS2)
. . .
. . .
P(st-1SM)
P(stSM)
aij(t)
State Tran. Net.
. . .
Expert 2
Expert M-1
Expert 1
Expert M
?Xt
?
Xt
17
"Time-line" HMM learning ------Modified
Baum-Welch Algorithm
  • For the time-line HMM,
  •     Gaussian emission distribution is assumed
  • State transition probability
  • log Likelihood Function (auxiliary function)
    about the current parameter ? and to be
    estimated parameter ?

18
"Time-line" HMM learning ------EM Step in the
Algorithm
  • Expectation (E-step)
  • Forward and Backward to estimate Q function.
  • Maximisation (M-step)
  • Maximising the Q function to update the
    parameter.
  • Initial state probability
  • Time-varying transition probability
  • Variance of Gaussian

Expectation Step Forward/backward steps to
estimate Q function.
Maximisation Step Maximising the Q function to
reach a critical point of ?.
EM step for time-line HMM Learning
19
State Transition Probability Modelling
  • The state transition property is a series of
    matrix entries corresponding to the time points.
  • The state of the time series is defined by
    , the
  • velocity of state vector, , could be used
    to learn the state transition probabilities.
  • A RBF structured state transition network
    performs the modelling.
  • In training, the network learns the mapping from

    to the transition probabilities at the
    time.
  • In prediction, the vector about the
    previous values of the time series, estimates
    transition probabilities by the network.

20
Review of THME training
Outcome
Technique Applied
Training Step
  • Fuzzy Membership Degree From
  • Fuzzy C-Means clustering.
  • Fuzzy C-Means Clustering

Trajectory Dividing
  • M Regression Models as Experts
  • Non-linear Regression by MLP, or RBF, or SVM.

Expert Training
  • HMM Gauss Distribution Parameters
  • Transition Probabilities for Time Points
  • Modified Baum-Welch Algorithm in EM Steps.

HMM Learning
  • State Transition Network

Transition Probability Modelling
  • Neural Network Modelling

21
Prediction
  •  Prior probability
  •  Combine experts
  • Single-step-ahead prediction posterior
    probability (by Bayes law) for state
    re-estimation.
  • Multi-step-ahead prediction Feed back output as
    input for next step prediction and repeat the
    steps.

22
Experiments ------ Data Sets
  • One-step-ahead prediction (compare with global
    models HME)
  • Laser Data Leuven Data
  • 1000 points for Training, next 500 for Test.
  • 5-fold Cross Validation.
  • Multi-step-ahead prediction (compare with
    Benchmark results)
  • Laser Data first 1000 points for Training,
    next 100 for test.
  • Leuven Data first 2000 points for Training,
    next 200 for test.
  • Mackey-Glass Data (17) 1000 points for
    Training next 500
  • for Test, predict from
    .
  • 1. Direct Prediction,
  • 2. Iterated One-step-ahead Prediction,
  • 85 iterations

23
Prediction Result ------ One-step-ahead
Prediction with THME-MLP, THME-RBF, THME-SVM, in
NMSE (Normalized Mean Squared Error). Number of
Expert2.
24
THME-SVM for Laser Time Series ------One-step-ah
ead Prediction
25
Prediction Error from, 1 THME-RBF, 2HMM-RBF
for Laser Data
1
2
26
Expert Combination
  • HMM transition probabilities
  • THME transition probabilities for the two experts
    on the points 5053.

27
Prior probabilities on Leuven data
------One-step-ahead Prediction with THME-RBF
28
Prediction of Laser Leuven Data
------Multi-step-ahead Prediction
  • Prediction of Laser data with THME-RBF, THME-SVM.
  • Prediction of Leuven data with THME-RBF, THME-SVM.

29
Prediction of Mackey-Glass Data
------Multi-step-ahead Prediction
  • Prediction of Mackey-Glass data with THME-RBF,
    THME-SVM in Direct Iterated mode.

30
Prediction of Laser Data
------Multi-step-ahead with THME-RBF
31
Prediction of Leuven Data
------Multi-step-ahead
32
Prediction Error for Mackey-Glass data
------Multi-step-ahead with THME-SVM
33
Review ------ Features of THME
  • Dynamics introduced by HMM.
  • Time-varying transition probabilities detect
    state transition.
  • Combine the experts by
    relying on both exterior information
    and interior state status .
  • Similar to IOHMM, But
  • Experts could be MLP, RBF, or SVM.
  • The variance of Gaussian emission distribution
    is adjustable to fit the noise level for each
    state instead of pre-setting in IOHMM.
  • This makes the state estimation more precise and
    gives high quality distribution evaluation for a
    series value

34
Summary
  • Velocity-based trajectory-dividing in ME is
    applied on chaotic time series prediction.
  • The "time-line" HMM introduces dynamics for the
    expert combination and more information is
    utilised.
  • The modified Baum-Welch algorithm in EM steps has
    been developed for the learning of the
    time-line HMM.
  • The time-line" hidden Markov expert model has
    better performance on some time series in
    one-step-ahead and multi-step-ahead prediction.
  • A connectionist network is used to model the
    time-varying state transition probabilities along
    a time series.

35
Discussion
  • Feature scheme for dividing the trajectory
  • may have other choice.
  • How to choose the number of local experts.
  • How to choose the parameters of the RBF
    transition probability network.

36
Reference
  • L. E. Baum, T. Petrie, G. Soules and N. Weiss, A
    Maximization Technique Occurring in the
    Statistical Analysis of Probabilistic Functions
    of Markov Chains, Annals of Mathematical
    Statistics, Vol. 41, pp. 164-171, 1970.
  • Y. Bengio and P. Frasconi, An Input Output HMM
    Architecture, in G. Tesauro, D. S. Touretzky and
    T. K. Leen Eds, Advances in Neural Information
    Processing Systems, Vol. 7, MIT Press, Cambridge,
    MA, 1995, pp. 427-434.
  • J. Bezdek and S. Pal, Fuzzy Models for Pattern
    Recognition, IEEE Press, 1992.
  • A. Dempster, N. Laird and D. Rubin, Maximum
    Likelihood from Incomplete Data via the EM
    Algorithm, Journal of the Royal Statistical
    Society, Series B, No. 39, pp. 1-38, 1977.
  • J. D. Farmer and J. J. Sidorowich, Predicting
    Chaotic Time Series, Physical Review Letters,
    Vol. 59, No. 8, pp. 845-848, 1987.
  • R. A. Jacobs, M. I. Jordan, S. J. Nowlan and G.
    E. Hinton, Adaptive Mixtures of Local Experts,
    Neural Computation, Vol. 3, pp. 79-87, 1991.
  • F. Takens, Detecting Strange Attractors in
    Turbulence, Proceedings of Symposium on
    Dynamical Systems and Turbulence, Lecture Notes
    in Mathematics, 1980, pp. 366-381.
  • A. S. Weigend, M. Mangeas and A. N. Srivastava,
    Nonlinear Gated Experts for Time Series
    Discovering Regimes and Avoiding Overfitting,
    International Journal of Neural Systems, Vol. 6,
    No. 4, pp. 373-399, 1995.
  • A. S. Weigend and S. Shi, Predicting Daily
    Probability Distributions of SP500 Returns,
    Journal of Forecasting, Vol. 19, pp. 375-392,
    2000.
Write a Comment
User Comments (0)
About PowerShow.com