Title: The%20Automatic%20Explanation%20of%20Multivariate%20Time%20Series%20(MTS)
1The Automatic Explanation of Multivariate Time
Series (MTS)
2The Problem - Data
- Datasets which are Characteristically
- High Dimensional MTS
- Large Time Lags
- Changing Dependencies
- Little or No Available Expert Knowledge
3The Problem - Requirement
- Lack of Algorithms to Assist Users in Explaining
Events where - Model Complex MTS Data
- Learnable from Data with Little or No User
Intervention - Transparency Throughout the Learning and
Explaining Process is Vital
4Contribution to Knowledge
- Using a Combination of Evolutionary Programming
(EP) and Bayesian Networks (BNs) to Overcome
Issues Outlined - Extending Learning Algorithms for BNs to Dynamic
Bayesian Networks (DBNs) with Comparison of
Efficiency - Introduction of an Algorithm for Decomposing High
Dimensional MTS into Several Lower Dimensional MTS
5Contribution to Knowledge (Continued)
- Introduction of New EP-Seeded GA Algorithm
- Incorporating Changing Dependencies
- Application to Synthetic and Real-World Chemical
Process Data - Transparency Retained Throughout Each Stage
6Framework
Pre-processing
Data Preparation
Variable Groupings
Model Building
Search Methods
Synthetic Data
Evaluation
Real Data
Changing Dependencies
Explanation
7Key Technical Points 1Comparing Adapted
Algorithms
- New Representation
- K2/K3 Cooper and Herskovitz
- Genetic Algorithm Larranaga
- Evolutionary Algorithm Wong
- Branch and Bound Bouckaert
- Log Likelihood / Description Length
- Publications
- International Journal of Intelligent Systems,
2001
8Key Technical Points 2Grouping
- A Number of Correlation Searches
- A Number of Grouping Algorithms
- Designed Metrics
- Comparison of All Combinations
- Synthetic and Real Data
- Publications
- IDA99
- IEEE Trans System Man and Cybernetics 2001
- Expert Systems 2000
9Key Technical Points 3EP-Seeded GA
- Approximate Correlation Search Based on the One
Used in Grouping Strategy - Results Used to Seed Initial Population of GA
- Uniform Crossover
- Specific Lag Mutation
- Publications
- Genetic Algorithms and Evolutionary Computation
Conference 1999 (GECCO99) - International Journal of Intelligent Systems,
2001 - IDA2001
10Key Technical Points 4Changing Dependencies
- Dynamic Cross Correlation Function for Analysing
MTS - Extend Representation Introduce a Heuristic
Search - Hidden Controller Hill Climb (HCHC) - Hidden Variables to Model State of the System
- Search for Structure and Hidden States
Iteratively
11Future Work
- Parameter Estimation
- Discretisation
- Changing Dependencies
- Efficiency
- New Datasets
- Gene Expression Data
- Visual Field Data
12DBN Representation
a0(t)
(3,1,4) (4,2,3) (2,3,2) (3,0,2) (3,4,2)
a1(t)
a2(t)
a2(t-2)
a3(t)
a3(t-2)
a3(t-4)
a4(t)
a4(t-3)
t-4 t-3 t-2 t-1 t
13Sample DBN Search Results
N 5, MaxT 10
N 10, MaxT 60
14Grouping
One High Dimensional MTS (A)
List
(a, b, lag) (a, b, lag) (a, b, lag)
1 2 R
G
0,3 1,4,5 2
15Sample Grouping Results
16Parameter Estimation
- Simulate Random Bag (Vary R, s and c, e)
- Calculate Mean and SD for Each Distribution (the
Probability of Selecting e from s) - Test for Normality (Lilliefors Test)
- Symbolic Regression (GP) to Determine the
Function for Mean and SD from R, s and c
(e will be Unknown) - Place Confidence Limits on the P(Number of
Correlations Found ? e)
17Final EPList
EP-Seeded GA
0 (a,b,l) 1 (a,b,l) 2 (a,b,l) EPListSi
ze (a,b,l)
EP
DBN
Initial GAPopulation
0 ((a,b,l),(a,b,l)(a,b,l)) 1
((a,b,l),(a,b,l)(a,b,l)) 2 ((a,b,l),(a,b,l)(a,b
,l)) GAPopsize ((a,b,l) (a,b,l))
GA
18EP-Seeded GA Results
N 10, MaxT 60
N 20, MaxT 60
19Varying the value of c
20Time Explanation
P(TGF instate_0) 1.0
t t-1 t-11 t-13 t-16 t-20 t-60
P(TT instate_0) 1.0
P(BPF instate_3) 1.0
P(TGF instate_3) 1.0
P(TT instate_1) 0.446
P(SOT instate_0) 0.314
P(C2 instate_0) 0.279
P(T6T instate_0) 0.347
P(RinT instate_0) 0.565
21Changing Dependencies
22Dynamic Cross- Correlation Function
23Hidden Variable - OpState
a0(t-4)
a2(t)
OpState2
a2(t-1)
a3(t-2)
t-4 t-3 t-2
t-1 t
24Hidden Controller Hill Climb
25HCHC Results - Oil Refinery Data
26HCHC Results - Synthetic Data
Generate Data from Several DBNs Append each
Section of Data Together to Form One MTS with
Changing Dependencies Run HCHC
27Time Explanation
t t-1 t-3 t-5 t-6 t-9
P(OpState1 is 0) 1.0
P(a1 is 0) 1.0
P(a0 is 0) 1.0
P(a2 is 1) 1.0
P(OpState1 is 0) 1.0
P(a1 is 1) 1.0
P(a0 is 0) 1.0
P(a2 is 1) 1.0
P(a2 is 0) 0.758
P(a0 is 0) 0.968
P(OpState0 is 0) 0.519
P(a0 is 1) 0.778
P(OpState0 is 0) 0.720
P(a2 is 0) 0.545
P(a0 is 1) 0.517
28Time Explanation
t t-1 t-3 t-5 t-6 t-7 t-9
P(OpState1 is 4) 1.0
P(a1 is 0) 1.0
P(a0 is 0) 1.0
P(a2 is 1) 1.0
P(OpState1 is 4) 1.0
P(a1 is 1) 1.0
P(a0 is 0) 1.0
P(a2 is 1) 1.0
P(a2 is 1) 0.570
P(a0 is 0) 0.506
P(OpState2 is 3) 0.210
P(a2 is 1) 0.974
P(OpState2 is 4) 0.222
P(a2 is 0) 0.882
P(a0 is 1) 0.549
29Process Diagram
TGF
C3
TT T6T
PGM
PGB
SOTT11
SOFT13
RINT
C11/3T
T36T
AFT
FF
RBT
BPF
C2
30Typical Discovered Relationships
TGF
C3
PGM
TT T6T
PGB
SOTT11
SOFT13
RINT
C11/3T
T36T
AFT
FF
RBT
BPF
C2
31Parameters
DBN Search GA EP PopSize 100 10 MR 0.1 0.8 C
R 0.8 --- Gen Based on FC Based on FC
Correlation Search c - Approx. 20 of s R -
Approx. 2.5 of s Grouping GA Synth.
1 Synth. 2-6 Oil PopSize 150 100 150 CR 0.
8 0.8 0.8 MR 0.1 0.1 0.1 Gen 150 100
(1000 for GPV) 150
32Parameters
EP-Seeded GA c - Approx. 20 of s EPListSize -
Approx. 2.5 of s GAPopSize - 10 MR -
0.1 CR - 0.8 LMR - 0.1 Gen - Based on FC
HCHC Oil Synthetic DBN_Iterations 1106 5000
Winlen 1000 200 Winjump 500 50