Nonlinear multivariate dynamics modelled by PLS Regression - PowerPoint PPT Presentation

1 / 124
About This Presentation
Title:

Nonlinear multivariate dynamics modelled by PLS Regression

Description:

Non-linear multivariate dynamics modelled by PLS Regression. Harald Martens ... y1 = X1 b11 X2 b12 X3 b13 f1. RMSEPcv. b11 vs x1 b12 vs x2 b13 vs x3. t. X. 1) ... – PowerPoint PPT presentation

Number of Views:294
Avg rating:3.0/5.0
Slides: 125
Provided by: haraldm1
Category:

less

Transcript and Presenter's Notes

Title: Nonlinear multivariate dynamics modelled by PLS Regression


1
Non-linear multivariate dynamics modelled by PLS
Regression
  • Harald Martens
  • Norwegian Food Research Institute,
  • Centre for Integrative Genetics (CIGENE),
    Norwegian U. of Life Sciences,
  • Centre for Biospectroscopy and Modelling
    (www.SpecMod.org),

Page 139, proceedings
2
Herman Wold, the father of PLS
Herman Wold (1983) Quantitative systems
analysis The pedigree and broad scope of PLS
soft modelling. In Food research and data
analysis H.Martens H.Russwurm, eds., Elsevier
3
From bio-chemometrics via biology and medicine to
physiometrics and mathematics
Herman Wold (1983) Quantitative systems
analysis The pedigree and broad scope of PLS
soft modelling. In Food research and data
analysis H.Martens H.Russwurm, eds., Elsevier
4
Three PLS cultures
  • 1. PLS PM path modelling
  • Multi-block. Marketing research,
    economy,sociology
  • 2. Booksteins PLS-based SVD
  • Two-block symmetrical. Archeology, antropology,
    biometrics
  • 3. PLSR Multivariate regression
  • Two-block assymmetrical extensions.
    Chemometrics, sensometrics

5
A dynamic model
6
PLS Regression for exploring complex nonlinear
dynamic models
  • PLSR for predicting model behaviour from model
    parameters
  • Explorative analysis of effects
  • 2) PLSR for forecasting the future from past
    observations
  • Elucidating control structures

7
PLS Regression for exploring complex nonlinear
dynamic models
  • PLSR for predicting model behaviour from model
    parameters
  • Explorative analysis of effects
  • 2) PLSR for forecasting the future from past
    observations
  • Elucidating control structures

8
Observed time series
How can we
Observed time series of state variables
9
Observed time series
How can we
Observed time series of state variables
Levels of state variables x1,x2,x3
?
time
Gene variations
10
Observed time series
How can we
Observed time series of state variables
Levels of state variables x1,x2,x3
?
PLSR
time
Gene variations
11
The Physiome projectModelling the dynamics of
the heart
  • Prof. Peter J. Hunter, Auckland Bioengineering
    Institute, U.
    Auckland, New Zealand (2008)
  • Heart Modeling Computational Physiology and the
    IUPS Physiome Project
  • Lecture Notes in Computer Science 4801,1161-3349,
    Springer
  • CIGENE part
  • Model the effect of genetic diversity on the
    heart model
  • First Model a single heart cell

12
Single cardiocyte cGP model
Bondarenko model modified by Liren Li and Nic
Smith
3 pages full of non-linear differential
equations, with lots of feedback. Can we
represent the complex model by a simpler model,
based on PLSR?
  • Simulation experiment
  • 8 bi-allelic loci Genotypes aa, Aa and AA
    coding for 50, 100 and 150 of normal
    parameter value
  • 38 6561 parameter combinations, gt 70 time
    series trajectories for each

13
Cell-level phenotypic variability in shape and
amplitude calcium transient in cytosol
One out of gt 70 phenotype time series in one
heart beat
14
36 of the gt70 phenotypes time seriesin a heart
beat
15
PLSRFind genotype-phenotype map
36 phenotypes x 200 time steps in one heart beat

X
Y
6561 different conditions

Y
X

16
Explained variance in timeseries Y from design
X by PLSR
of PLS PCs
17
Explained Y-variance as function of of PLS
components from X
18
Predicted Y vs true Y5 PLS PCs
19
5 of 6561predicted parameter combinationsblue
input data, redpredicted from PLSR gene model
20
Loadings
21
Loadings
22
PLS Regression for exploring complex nonlinear
dynamic models
  • PLSR for predicting model behaviour from model
    parameters
  • Explorative analysis of effects
  • 2) PLSR for forecasting the future from past
    observations
  • Elucidating control structures

23
Observed time series
How can we
Observed time series of state variables
24
Observed time series
How can we
Observed time series of state variables
?
Levels of state variables x1,x2,x3
time
25
Observed time series
How can we
Observed time series of state variables
?
Levels of state variables x1,x2,x3
time
Simple linear ODE model
26
Observed time series
How can we
?
Levels of state variables x1,x2,x3
time
Simple linear ODE model
27
Generating Dynamic Models from time series data
  • X
  • n200 points in time (objects, rows)
  • p34 time series (X-variables, columns)

28
X
For the first 60 of the 200 time points
29
YdX/dt
For the first 60 of the 200 time points
30
Transposing the time series!
X34 time series, standardized
YdX/dt
B
?
?
200 time points
200 time points
31
Output from standard Chemometrics software (The
Unscrambler ver. 9.8)
32
x19
33
y19 dx19/dt and its prediction(20 cv segments)
34
Bluegene-controlled model parametersRedion
currents etc
35
Bluegene-controlled model parametersRedion
currents etc
36
Bluegene-controlled model parametersRedion
currents etc
37
B
38
Non-Linear dynamic modelling, based on
nominal-leve PLSR
39
Observed time series
How can we
Observed time series of state variables
40
Observed time series
How can we
Observed time series of state variables
?
Levels of state variables x1,x2,x3
time
41
Observed time series
How can we
Observed time series of state variables
?
Levels of state variables x1,x2,x3
time
42
Observed time series
How can we
?
Levels of state variables x1,x2,x3
time
Type of dynamic system Additive (,-) or
multiplicative (x,/) ? How many different
regulatory subsystems?
43
Observed time series
How can we
Sign,size of effects?
?
Levels of state variables x1,x2,x3
time
Type of dynamic system Additive (,-) or
multiplicative (x,/) ? How many different
regulatory subsystems?
44
Observed time series
How can we
From data via soft to hard model
a)
State levels
time
b)
45
Observed time series
How can we
From data via soft to hard model
a)
c)
State levels
time
b)
46
Observed time series
How can we
From data via soft to hard model
a)
c)
State levels
time
b)
47
Observed time series
48
1)
x
Experimental design
Simulation
t
Complex math. model
dx1/dt
3)
dx2/dt
dx3/dt
x1
x2
x3
49
1)
x
Experimental design
Simulation
t
Complex math. model
dx/dtY
2)
t
dx1/dt
3)
dx2/dt
dx3/dt
x1
x2
x3
50
1)
x
Experimental design
Simulation
t
Complex math. model
dx/dtY
2)
t
dx1/dt
3)
dx2/dt
dx3/dt
x1
x2
x3
51
1)
x
Experimental design
Simulation
t
Complex math. model
dx/dtY
2)
t
Y X ? B(x)
F
dx1/dt
3)
Jack-knifed PLSR to find B(x)!
dx2/dt
E.g.
y1 X1 ? b11 X2 ? b12
X3 ? b13 f1
dx3/dt
x1
x2
x3
52
1)
x
Experimental design
Simulation
t
Complex math. model
dx/dtY
2)
t
RMSEPcv
b11 vs x1 b12 vs x2
b13 vs x3
dx1/dt
dx2/dt
dx3/dt
y1 X1 ? b11 X2 ? b12
X3 ? b13 f1
x1
x2
x3
53
1)
x
Experimental design
Simulation
t
Complex math. model
dx/dtY
2)
t
RMSEPcv
dx1/dt
dx2/dt
b11 vs x1 b12 vs x2
b13 vs x3
dx3/dt
y1 X1 ? b11 X2 ? b12
X3 ? b13 f1
x1
x2
x3
54
1)
x
Experimental design
Simulation
t
Complex math. model
dx/dtY
2)
t
dx1/dt
dx2/dt
dx3/dt
x1
x2
x3
55
1)
x
Experimental design
Simulation
t
Complex math. model
dx/dtY
2)
t
dx1/dt
3)
dx2/dt
dx3/dt
x1
x2
x3
56
3 state variables (genes), 125 different time
series
57
3 state variables (genes), just 4 different time
series
58
Time series simulated from 6 different systems, 5
initial values
Models 1 2 3
4 5 6
Different initial value sets
59
6 different regulatory models of three state
variables
60
6 different regulatory models of three state
variables
All 6 models correctly identified from time
series data, via the combination of look-up in
Functional-Form data-base
61
Observed time series
62
Figure 2
c)
a)
x
dx1/dt
t
b)
dx/dt
dx2/dt
t
d)
dx3/dt
x
63
Summary of results
  • Works even for realistic amounts of data
  • 4 time series
  • models 2-6 OK!
  • model 1 most difficult, but not bad
  • Noise free data need to be challenged
  • Further plans
  • More complex additive models
  • Non-additive models (dynamic Parafac etc)
  • Recasting, explicit

64
Transformations
Input
Nominal-level
x xk f(x0,k)
Derivatives
x0, k, Many multivariate time series
Y yj (dxj/dt)
Recasting / rotation
No
Refolded effect tensor landscapes
Yes
GMA?
Functional effect landscapes
65
Acknowledgements
  • Heart model data
  • Arne Gjuvsland, Jon Olav Vik, Kristin Tøndel
    (CIGENE)
  • Funding
  • Norwegian Research Council

66
  • Thank you !

67
(No Transcript)
68
Rates vs States in latent spaceuaYqa, taXva,
a1,2,3,4
69
Nominal-level PLSR
Y X ? B F

?

70
PLSR solution of ODE system
Y X ? B F
?
The Jacobian

?

Control theory Eigenvalues of B Complex?, /-
sign? Chemometrics Singular values of
B Size?estimate stab.
71
Nominal-level PLSR
Y X ? B(x)
F
?
The Jacobian

?

Jacobian B is different in different parts of the
state space !
Split each state variable into 0/1 category
variables!
Blue0, white1
72
Nominal-level PLSR
Y X ? B(x)
F
?

?

Jacobian B is different in different parts of the
state space !
73
Significance of B
-log(p)
74
Analyzing a model
75
Analyzing a model
76
Analyzing a model
77
Analyzing a model
78
SVD s for Jacobian estimated by PLSR with 6 and
10 PCs
79
Eigenvalues for Jacobian estimated by PLSR with 6
and 10 PCs
80
Eigenvalues Jacobian estimated by PLSR with 10 PCs
  • Eigenvalues, 10 PCs
  • 1 -0.4079 1.9653i
  • 2 -0.4079 - 1.9653i
  • 3 -0.6251 1.0477i
  • 4 -0.6251 - 1.0477i
  • 5 -0.0708 0.9537i
  • 6 -0.0708 - 0.9537i
  • 7 -0.1040 0.4522i
  • 8 -0.1040 - 0.4522i
  • 9 0.2259 0.2933i
  • 10 0.2259 - 0.2933i
  • 11 0.0000 0.0000i
  • 12 0.0000 - 0.0000i
  • 0.0000

81
Illustration of success story for PLSR in
chemometrics Multivariate calibration of NIR
instruments Making low-cost measurements of
intact samples useful! Samples Mixtures of
protein and starch powders. Variables Near-
Infrared light absorption measured at 100
wavelength channels.
NIR input data, Measurement time lt 1 second /
sample
Instrument signal
850 1050 nm
82
Standard PLSR output from a multivariate
calibration program (The Unscrambler ver. 9.8)
83
Instrument signal
Instrument signal
850 1050 nm
850 1050 nm
Spectra of powders, BEFORE
PRE-PROCESSING
Spectra of powders, AFTER EMSC PRE-PROCESSING
84
Example 2 A fermentation process in dairy
industry monitored by multichannel instrument
(FTIR ATR) for 26 hours
85
Three first principal component scores
86
Semi-soft modelling of the process
87
and functional genomics for optimized milk and
meat quality
Practical example of bio-chemometrics using PLSR
Large-scale high-dimensional FTIR-bio-screening
project in Norway, to improve milk quality
Feeding experiments
Calibration milk samples
Milk FTIR spectra
Reference measurements, fatty acids (GC-MS)
Wavenumber
Routine milk analysis of every Norwegian cow 6
times a year
Cal. models
Pred. fatty acids etc
Other components
gt3 million milk spectra /year
Background knowledge
predictive genes,QTLs
Heritability, feeding effects etc
20K gene markers
causal genes?
88
and functional genomics for optimized milk and
meat quality
Practical example of bio-chemometrics
Large-scale high-dimensional FTIR-bio-screening
project in Norway, to improve milk quality
Feeding experiments
Calibration milk samples
Milk FTIR spectra
Reference measurements, fatty acids (GC-MS)
Wavenumber
Routine milk analysis of every Norwegian cow 6
times a year
Cal. models
Pred. fatty acids etc
Other components
gt3 million milk spectra /year
Background knowledge
predictive genes,QTLs
Heritability, feeding effects etc
20K gene markers
causal genes?
89
Estimated effect on human total cholesterol level
(assuming 20 of energy intake from milk fat)
90
and functional genomics for optimized milk and
meat quality
Practical example of bio-chemometrics
Large-scale high-dimensional FTIR-bio-screening
project in Norway, to improve milk quality
Feeding experiments
Calibration milk samples
Milk FTIR spectra
Reference measurements, fatty acids (GC-MS)
Wavenumber
Routine milk analysis of every Norwegian cow 6
times a year
Cal. models
Pred. fatty acids etc
Other components
gt3 million milk spectra /year
predictive genes,QTLs
Heritability, feeding effects etc
20K gene markers
causal genes?
91
and functional genomics for optimized milk and
meat quality
Practical example of bio-chemometrics
Large-scale high-dimensional FTIR-bio-screening
project in Norway, to improve milk quality
Feeding experiments
Calibration milk samples
Milk FTIR spectra
Reference measurements, fatty acids (GC-MS)
Wavenumber
Routine milk analysis of every Norwegian cow 6
times a year
Cal. models
Pred. fatty acids etc
Other components
gt3 million milk spectra /year
Background knowledge
predictive genes,QTLs
Heritability, feeding effects etc
20K gene markers
causal genes?
92
Distribution of predicted milk
contribution to human cholesterol20 000 of many
million milk samples
93
and functional genomics for optimized milk and
meat quality
Practical example of bio-chemometrics
Large-scale high-dimensional FTIR-bio-screening
project in Norway, to improve milk quality
Feeding experiments
Calibration milk samples
Milk FTIR spectra
Reference measurements, fatty acids (GC-MS)
Wavenumber
Routine milk analysis of every Norwegian cow 6
times a year
Cal. models
Pred. fatty acids etc
Other components
gt3 million milk spectra /year
Background knowledge
predictive genes,QTLs
Heritability, feeding effects etc
20K gene markers
causal genes?
94
BTA6
Test statistics
Test gt 5.4 p lt 0.001
SCD5
Megabases DNA in Chromosome 6
(Stearoyl-CoA desaturase 5)
95
Prof. Herman Wold, the father of PLS Poster at
IUFoST conference FOOD RESEARCH AND DATA
ANALYSIS OSLO 1982 Same year as PLS Regression
was published
96
Herman Wold, the father of PLS
97
My own field Measurements and modellingin
systems biology
Environment, human activity
DNA
mRNA
Proteome
Metabolome
Biological Structure
Other phenotypes
1D-, 2D -Electrophoresis MALDI-TOF LC-MS
GC,LC (-MS)
Sequencing, SNP, AFLP,
Realtime PCR Micro-array
Disease incidence Virulence Drug
sensitivity Biofilm formation Sensory Science
Economy
NIR, FT-IR Raman FlourescenceSerotyping
98
My own field Measurements and modellingin
systems biology
Environment, human activity
DNA
mRNA
Proteome
Metabolome
Biological Structure
Other phenotypes
1D-, 2D -Electrophoresis MALDI-TOF LC-MS
GC,LC (-MS)
Sequencing, SNP, AFLP,
Realtime PCR Micro-array
Disease incidence Virulence Drug
sensitivity Biofilm formation Sensory Science
Economy
NIR, FT-IR Raman FlourescenceSerotyping
Data analysis Integrating different types of
bio-data Look for common variation patterns Make
quantitative prediction and forecasting Identify
outliers
99
Non-linear feedback dynamics Impossible to
study theoretically. Study empirically!
  • Linking the two cultures
  • theory-driven and data-driven modelling

100
The SystemControl of cell differentiation
Two cell-signalling proteins Delta and Notch
a)
b)
d)
c)
101
The SystemControl of cell differentiation
Two cell-signalling proteins Delta and Notch
a)
b)
d)
c)
102
The SystemControl of cell differentiation
Two cell-signalling proteins Delta and Notch
a)
b)
d)
c)
Frustration !
103
The mathematical model for one cell
Hill function S(.)
time
104
The mathematical model for one cell
time
105
The mathematical model for thousands of cells
5000 CELLS!
Steady state solution
Perturbed initial state
106
The mathematical model for thousands of cells
5000 CELLS!
107
The mathematical model for thousands of cells
Solution Designed computer simulations
Integrate till equilibrium in computer
Describe the resulting
patterns Study results by PLSR!
5000 CELLS!
108
For a given parameter combinationInitially
perturb all cells with small random numbers,
integrate till steady state
time
109
Different parameter combinations give different
steady-state patterns
b)
a)
c)
d)
f)
e)
110
Analysis by screening design Automatic image
analysis profiling
Generate several full factorial designs Compute
equilibrium solutions in super-computer (almost
all stabilized) Form 2D image for each
solution Characterize by computer image analysis
(intensity histogram, spatial autocorrelation,
cluster analysis etc) Multivariate data
modelling Image analysis vs Design factors
111
PLSR Score space computerized image analysis
X computer image analysis variables, Ydesign,

112
Analysis by reduced design Selected samples,
more informative profiling (Sensory Analysis)
  • Select sample subset set 27-2 32 samples
  • Print these 32 images, one copy for each of 11
    assessors
  • Develop sensory profile of words (consensus 14
    words)
  • Calibrate assessors for intersubjectivity
  • Perform descriptive sensory analysis
  • Average over 11 assessors

113
Sensory descriptors, examples
114
PLSR Correlation loadings, PC 1 vs PC 2 Some
of the Sensory (X) and Design (Y) variables
named
115
PLSR X sensory descriptors, Y design variables
( interactions)
Jack-knife std.dev.
116
Figure 4
Class Overview
Details
b)
a)
I
d)
c)
II
e)
f)
III
h)
g)
IV
??
j)
j)
i)
Centre
117
Spatio-intensity discretisation
118
Spatio-intensity discretisation
119
Spatio-intensity discretisation
Under what conditions?
120
Pursuit of detail by dense design for a few
selected parameter combinations, perform special
profiling (Sensory Analysis)
  • Vary one model parameter in small steps, keeping
    the others const.
  • Perform descriptive sensory analysis
  • Relate to multivariate histogram of protein level
    frequencies

121
New parameter combinations integrated till
steady-state Solutions printed Images submitted
to improved sensory profiling. Results for some
of the sensory descriptors
Whiteness
Sensory panel mean
Thickness curls
Two-headedness
Sigmoid threshold for Delta activity
122
Diameterlog( of cells)
Notch activity
Whiteness
Sensory panel mean
Thickness curls
Two-headedness
Sigmoid threshold for Delta activity
123
Reference
  • Harald Martens, Siren R. Veflingstad, Erik
    Plahte, Magni Martens, Dominique Bertrand and
    Stig W. Omholt (2009)
  • The genotype-phenotype relationship in
    multicellular pattern-generating models the
    neglected role of pattern descriptors
  • BMC Systems Biology, in press 2009

124
The usual suspects
Environment, human activity
DNA
mRNA
Proteome
Metabolome
Biological Structure
Other phenotypes
1D-, 2D -Electrophoresis MALDI-TOF LC-MS
GC,LC (-MS)
Sequencing, SNP, AFLP,
Realtime PCR Micro-array
Disease incidence Virulence Drug
sensitivity Biofilm formation Sensory Science
Economy
NIR, FT-IR Raman FlourescenceSerotyping
Data analysis Integrating different types of
bio-data Look for common variation patterns Make
quantitative prediction and forecasting Identify
outliers
Write a Comment
User Comments (0)
About PowerShow.com