Title: Nonlinear multivariate dynamics modelled by PLS Regression
1Non-linear multivariate dynamics modelled by PLS
Regression
- Harald Martens
- Norwegian Food Research Institute,
- Centre for Integrative Genetics (CIGENE),
Norwegian U. of Life Sciences, - Centre for Biospectroscopy and Modelling
(www.SpecMod.org),
Page 139, proceedings
2Herman Wold, the father of PLS
Herman Wold (1983) Quantitative systems
analysis The pedigree and broad scope of PLS
soft modelling. In Food research and data
analysis H.Martens H.Russwurm, eds., Elsevier
3From bio-chemometrics via biology and medicine to
physiometrics and mathematics
Herman Wold (1983) Quantitative systems
analysis The pedigree and broad scope of PLS
soft modelling. In Food research and data
analysis H.Martens H.Russwurm, eds., Elsevier
4Three PLS cultures
- 1. PLS PM path modelling
- Multi-block. Marketing research,
economy,sociology - 2. Booksteins PLS-based SVD
- Two-block symmetrical. Archeology, antropology,
biometrics - 3. PLSR Multivariate regression
- Two-block assymmetrical extensions.
Chemometrics, sensometrics
5A dynamic model
6PLS Regression for exploring complex nonlinear
dynamic models
- PLSR for predicting model behaviour from model
parameters - Explorative analysis of effects
- 2) PLSR for forecasting the future from past
observations - Elucidating control structures
7PLS Regression for exploring complex nonlinear
dynamic models
- PLSR for predicting model behaviour from model
parameters - Explorative analysis of effects
- 2) PLSR for forecasting the future from past
observations - Elucidating control structures
8Observed time series
How can we
Observed time series of state variables
9Observed time series
How can we
Observed time series of state variables
Levels of state variables x1,x2,x3
?
time
Gene variations
10Observed time series
How can we
Observed time series of state variables
Levels of state variables x1,x2,x3
?
PLSR
time
Gene variations
11The Physiome projectModelling the dynamics of
the heart
- Prof. Peter J. Hunter, Auckland Bioengineering
Institute, U.
Auckland, New Zealand (2008) - Heart Modeling Computational Physiology and the
IUPS Physiome Project - Lecture Notes in Computer Science 4801,1161-3349,
Springer - CIGENE part
- Model the effect of genetic diversity on the
heart model - First Model a single heart cell
12Single cardiocyte cGP model
Bondarenko model modified by Liren Li and Nic
Smith
3 pages full of non-linear differential
equations, with lots of feedback. Can we
represent the complex model by a simpler model,
based on PLSR?
- Simulation experiment
- 8 bi-allelic loci Genotypes aa, Aa and AA
coding for 50, 100 and 150 of normal
parameter value - 38 6561 parameter combinations, gt 70 time
series trajectories for each
13Cell-level phenotypic variability in shape and
amplitude calcium transient in cytosol
One out of gt 70 phenotype time series in one
heart beat
1436 of the gt70 phenotypes time seriesin a heart
beat
15PLSRFind genotype-phenotype map
36 phenotypes x 200 time steps in one heart beat
X
Y
6561 different conditions
Y
X
16 Explained variance in timeseries Y from design
X by PLSR
of PLS PCs
17Explained Y-variance as function of of PLS
components from X
18Predicted Y vs true Y5 PLS PCs
195 of 6561predicted parameter combinationsblue
input data, redpredicted from PLSR gene model
20Loadings
21Loadings
22PLS Regression for exploring complex nonlinear
dynamic models
- PLSR for predicting model behaviour from model
parameters - Explorative analysis of effects
- 2) PLSR for forecasting the future from past
observations - Elucidating control structures
23Observed time series
How can we
Observed time series of state variables
24Observed time series
How can we
Observed time series of state variables
?
Levels of state variables x1,x2,x3
time
25Observed time series
How can we
Observed time series of state variables
?
Levels of state variables x1,x2,x3
time
Simple linear ODE model
26Observed time series
How can we
?
Levels of state variables x1,x2,x3
time
Simple linear ODE model
27Generating Dynamic Models from time series data
- X
- n200 points in time (objects, rows)
- p34 time series (X-variables, columns)
28X
For the first 60 of the 200 time points
29YdX/dt
For the first 60 of the 200 time points
30Transposing the time series!
X34 time series, standardized
YdX/dt
B
?
?
200 time points
200 time points
31Output from standard Chemometrics software (The
Unscrambler ver. 9.8)
32x19
33y19 dx19/dt and its prediction(20 cv segments)
34Bluegene-controlled model parametersRedion
currents etc
35Bluegene-controlled model parametersRedion
currents etc
36Bluegene-controlled model parametersRedion
currents etc
37B
38Non-Linear dynamic modelling, based on
nominal-leve PLSR
39Observed time series
How can we
Observed time series of state variables
40Observed time series
How can we
Observed time series of state variables
?
Levels of state variables x1,x2,x3
time
41Observed time series
How can we
Observed time series of state variables
?
Levels of state variables x1,x2,x3
time
42Observed time series
How can we
?
Levels of state variables x1,x2,x3
time
Type of dynamic system Additive (,-) or
multiplicative (x,/) ? How many different
regulatory subsystems?
43Observed time series
How can we
Sign,size of effects?
?
Levels of state variables x1,x2,x3
time
Type of dynamic system Additive (,-) or
multiplicative (x,/) ? How many different
regulatory subsystems?
44Observed time series
How can we
From data via soft to hard model
a)
State levels
time
b)
45Observed time series
How can we
From data via soft to hard model
a)
c)
State levels
time
b)
46Observed time series
How can we
From data via soft to hard model
a)
c)
State levels
time
b)
47Observed time series
481)
x
Experimental design
Simulation
t
Complex math. model
dx1/dt
3)
dx2/dt
dx3/dt
x1
x2
x3
491)
x
Experimental design
Simulation
t
Complex math. model
dx/dtY
2)
t
dx1/dt
3)
dx2/dt
dx3/dt
x1
x2
x3
501)
x
Experimental design
Simulation
t
Complex math. model
dx/dtY
2)
t
dx1/dt
3)
dx2/dt
dx3/dt
x1
x2
x3
511)
x
Experimental design
Simulation
t
Complex math. model
dx/dtY
2)
t
Y X ? B(x)
F
dx1/dt
3)
Jack-knifed PLSR to find B(x)!
dx2/dt
E.g.
y1 X1 ? b11 X2 ? b12
X3 ? b13 f1
dx3/dt
x1
x2
x3
521)
x
Experimental design
Simulation
t
Complex math. model
dx/dtY
2)
t
RMSEPcv
b11 vs x1 b12 vs x2
b13 vs x3
dx1/dt
dx2/dt
dx3/dt
y1 X1 ? b11 X2 ? b12
X3 ? b13 f1
x1
x2
x3
531)
x
Experimental design
Simulation
t
Complex math. model
dx/dtY
2)
t
RMSEPcv
dx1/dt
dx2/dt
b11 vs x1 b12 vs x2
b13 vs x3
dx3/dt
y1 X1 ? b11 X2 ? b12
X3 ? b13 f1
x1
x2
x3
541)
x
Experimental design
Simulation
t
Complex math. model
dx/dtY
2)
t
dx1/dt
dx2/dt
dx3/dt
x1
x2
x3
551)
x
Experimental design
Simulation
t
Complex math. model
dx/dtY
2)
t
dx1/dt
3)
dx2/dt
dx3/dt
x1
x2
x3
563 state variables (genes), 125 different time
series
573 state variables (genes), just 4 different time
series
58Time series simulated from 6 different systems, 5
initial values
Models 1 2 3
4 5 6
Different initial value sets
596 different regulatory models of three state
variables
606 different regulatory models of three state
variables
All 6 models correctly identified from time
series data, via the combination of look-up in
Functional-Form data-base
61Observed time series
62Figure 2
c)
a)
x
dx1/dt
t
b)
dx/dt
dx2/dt
t
d)
dx3/dt
x
63Summary of results
- Works even for realistic amounts of data
- 4 time series
- models 2-6 OK!
- model 1 most difficult, but not bad
- Noise free data need to be challenged
- Further plans
- More complex additive models
- Non-additive models (dynamic Parafac etc)
- Recasting, explicit
64Transformations
Input
Nominal-level
x xk f(x0,k)
Derivatives
x0, k, Many multivariate time series
Y yj (dxj/dt)
Recasting / rotation
No
Refolded effect tensor landscapes
Yes
GMA?
Functional effect landscapes
65Acknowledgements
- Heart model data
- Arne Gjuvsland, Jon Olav Vik, Kristin Tøndel
(CIGENE) - Funding
- Norwegian Research Council
66 67(No Transcript)
68Rates vs States in latent spaceuaYqa, taXva,
a1,2,3,4
69Nominal-level PLSR
Y X ? B F
?
70PLSR solution of ODE system
Y X ? B F
?
The Jacobian
?
Control theory Eigenvalues of B Complex?, /-
sign? Chemometrics Singular values of
B Size?estimate stab.
71Nominal-level PLSR
Y X ? B(x)
F
?
The Jacobian
?
Jacobian B is different in different parts of the
state space !
Split each state variable into 0/1 category
variables!
Blue0, white1
72Nominal-level PLSR
Y X ? B(x)
F
?
?
Jacobian B is different in different parts of the
state space !
73Significance of B
-log(p)
74Analyzing a model
75Analyzing a model
76Analyzing a model
77Analyzing a model
78SVD s for Jacobian estimated by PLSR with 6 and
10 PCs
79Eigenvalues for Jacobian estimated by PLSR with 6
and 10 PCs
80Eigenvalues Jacobian estimated by PLSR with 10 PCs
- Eigenvalues, 10 PCs
- 1 -0.4079 1.9653i
- 2 -0.4079 - 1.9653i
- 3 -0.6251 1.0477i
- 4 -0.6251 - 1.0477i
- 5 -0.0708 0.9537i
- 6 -0.0708 - 0.9537i
- 7 -0.1040 0.4522i
- 8 -0.1040 - 0.4522i
- 9 0.2259 0.2933i
- 10 0.2259 - 0.2933i
- 11 0.0000 0.0000i
- 12 0.0000 - 0.0000i
- 0.0000
81Illustration of success story for PLSR in
chemometrics Multivariate calibration of NIR
instruments Making low-cost measurements of
intact samples useful! Samples Mixtures of
protein and starch powders. Variables Near-
Infrared light absorption measured at 100
wavelength channels.
NIR input data, Measurement time lt 1 second /
sample
Instrument signal
850 1050 nm
82Standard PLSR output from a multivariate
calibration program (The Unscrambler ver. 9.8)
83Instrument signal
Instrument signal
850 1050 nm
850 1050 nm
Spectra of powders, BEFORE
PRE-PROCESSING
Spectra of powders, AFTER EMSC PRE-PROCESSING
84Example 2 A fermentation process in dairy
industry monitored by multichannel instrument
(FTIR ATR) for 26 hours
85Three first principal component scores
86Semi-soft modelling of the process
87and functional genomics for optimized milk and
meat quality
Practical example of bio-chemometrics using PLSR
Large-scale high-dimensional FTIR-bio-screening
project in Norway, to improve milk quality
Feeding experiments
Calibration milk samples
Milk FTIR spectra
Reference measurements, fatty acids (GC-MS)
Wavenumber
Routine milk analysis of every Norwegian cow 6
times a year
Cal. models
Pred. fatty acids etc
Other components
gt3 million milk spectra /year
Background knowledge
predictive genes,QTLs
Heritability, feeding effects etc
20K gene markers
causal genes?
88and functional genomics for optimized milk and
meat quality
Practical example of bio-chemometrics
Large-scale high-dimensional FTIR-bio-screening
project in Norway, to improve milk quality
Feeding experiments
Calibration milk samples
Milk FTIR spectra
Reference measurements, fatty acids (GC-MS)
Wavenumber
Routine milk analysis of every Norwegian cow 6
times a year
Cal. models
Pred. fatty acids etc
Other components
gt3 million milk spectra /year
Background knowledge
predictive genes,QTLs
Heritability, feeding effects etc
20K gene markers
causal genes?
89Estimated effect on human total cholesterol level
(assuming 20 of energy intake from milk fat)
90and functional genomics for optimized milk and
meat quality
Practical example of bio-chemometrics
Large-scale high-dimensional FTIR-bio-screening
project in Norway, to improve milk quality
Feeding experiments
Calibration milk samples
Milk FTIR spectra
Reference measurements, fatty acids (GC-MS)
Wavenumber
Routine milk analysis of every Norwegian cow 6
times a year
Cal. models
Pred. fatty acids etc
Other components
gt3 million milk spectra /year
predictive genes,QTLs
Heritability, feeding effects etc
20K gene markers
causal genes?
91and functional genomics for optimized milk and
meat quality
Practical example of bio-chemometrics
Large-scale high-dimensional FTIR-bio-screening
project in Norway, to improve milk quality
Feeding experiments
Calibration milk samples
Milk FTIR spectra
Reference measurements, fatty acids (GC-MS)
Wavenumber
Routine milk analysis of every Norwegian cow 6
times a year
Cal. models
Pred. fatty acids etc
Other components
gt3 million milk spectra /year
Background knowledge
predictive genes,QTLs
Heritability, feeding effects etc
20K gene markers
causal genes?
92Distribution of predicted milk
contribution to human cholesterol20 000 of many
million milk samples
93and functional genomics for optimized milk and
meat quality
Practical example of bio-chemometrics
Large-scale high-dimensional FTIR-bio-screening
project in Norway, to improve milk quality
Feeding experiments
Calibration milk samples
Milk FTIR spectra
Reference measurements, fatty acids (GC-MS)
Wavenumber
Routine milk analysis of every Norwegian cow 6
times a year
Cal. models
Pred. fatty acids etc
Other components
gt3 million milk spectra /year
Background knowledge
predictive genes,QTLs
Heritability, feeding effects etc
20K gene markers
causal genes?
94BTA6
Test statistics
Test gt 5.4 p lt 0.001
SCD5
Megabases DNA in Chromosome 6
(Stearoyl-CoA desaturase 5)
95Prof. Herman Wold, the father of PLS Poster at
IUFoST conference FOOD RESEARCH AND DATA
ANALYSIS OSLO 1982 Same year as PLS Regression
was published
96Herman Wold, the father of PLS
97My own field Measurements and modellingin
systems biology
Environment, human activity
DNA
mRNA
Proteome
Metabolome
Biological Structure
Other phenotypes
1D-, 2D -Electrophoresis MALDI-TOF LC-MS
GC,LC (-MS)
Sequencing, SNP, AFLP,
Realtime PCR Micro-array
Disease incidence Virulence Drug
sensitivity Biofilm formation Sensory Science
Economy
NIR, FT-IR Raman FlourescenceSerotyping
98My own field Measurements and modellingin
systems biology
Environment, human activity
DNA
mRNA
Proteome
Metabolome
Biological Structure
Other phenotypes
1D-, 2D -Electrophoresis MALDI-TOF LC-MS
GC,LC (-MS)
Sequencing, SNP, AFLP,
Realtime PCR Micro-array
Disease incidence Virulence Drug
sensitivity Biofilm formation Sensory Science
Economy
NIR, FT-IR Raman FlourescenceSerotyping
Data analysis Integrating different types of
bio-data Look for common variation patterns Make
quantitative prediction and forecasting Identify
outliers
99Non-linear feedback dynamics Impossible to
study theoretically. Study empirically!
- Linking the two cultures
- theory-driven and data-driven modelling
100The SystemControl of cell differentiation
Two cell-signalling proteins Delta and Notch
a)
b)
d)
c)
101The SystemControl of cell differentiation
Two cell-signalling proteins Delta and Notch
a)
b)
d)
c)
102The SystemControl of cell differentiation
Two cell-signalling proteins Delta and Notch
a)
b)
d)
c)
Frustration !
103The mathematical model for one cell
Hill function S(.)
time
104The mathematical model for one cell
time
105The mathematical model for thousands of cells
5000 CELLS!
Steady state solution
Perturbed initial state
106The mathematical model for thousands of cells
5000 CELLS!
107The mathematical model for thousands of cells
Solution Designed computer simulations
Integrate till equilibrium in computer
Describe the resulting
patterns Study results by PLSR!
5000 CELLS!
108For a given parameter combinationInitially
perturb all cells with small random numbers,
integrate till steady state
time
109Different parameter combinations give different
steady-state patterns
b)
a)
c)
d)
f)
e)
110Analysis by screening design Automatic image
analysis profiling
Generate several full factorial designs Compute
equilibrium solutions in super-computer (almost
all stabilized) Form 2D image for each
solution Characterize by computer image analysis
(intensity histogram, spatial autocorrelation,
cluster analysis etc) Multivariate data
modelling Image analysis vs Design factors
111PLSR Score space computerized image analysis
X computer image analysis variables, Ydesign,
112Analysis by reduced design Selected samples,
more informative profiling (Sensory Analysis)
- Select sample subset set 27-2 32 samples
- Print these 32 images, one copy for each of 11
assessors - Develop sensory profile of words (consensus 14
words) - Calibrate assessors for intersubjectivity
- Perform descriptive sensory analysis
- Average over 11 assessors
113Sensory descriptors, examples
114 PLSR Correlation loadings, PC 1 vs PC 2 Some
of the Sensory (X) and Design (Y) variables
named
115PLSR X sensory descriptors, Y design variables
( interactions)
Jack-knife std.dev.
116Figure 4
Class Overview
Details
b)
a)
I
d)
c)
II
e)
f)
III
h)
g)
IV
??
j)
j)
i)
Centre
117Spatio-intensity discretisation
118Spatio-intensity discretisation
119Spatio-intensity discretisation
Under what conditions?
120Pursuit of detail by dense design for a few
selected parameter combinations, perform special
profiling (Sensory Analysis)
- Vary one model parameter in small steps, keeping
the others const. - Perform descriptive sensory analysis
- Relate to multivariate histogram of protein level
frequencies
121New parameter combinations integrated till
steady-state Solutions printed Images submitted
to improved sensory profiling. Results for some
of the sensory descriptors
Whiteness
Sensory panel mean
Thickness curls
Two-headedness
Sigmoid threshold for Delta activity
122Diameterlog( of cells)
Notch activity
Whiteness
Sensory panel mean
Thickness curls
Two-headedness
Sigmoid threshold for Delta activity
123Reference
-
- Harald Martens, Siren R. Veflingstad, Erik
Plahte, Magni Martens, Dominique Bertrand and
Stig W. Omholt (2009) -
- The genotype-phenotype relationship in
multicellular pattern-generating models the
neglected role of pattern descriptors - BMC Systems Biology, in press 2009
124The usual suspects
Environment, human activity
DNA
mRNA
Proteome
Metabolome
Biological Structure
Other phenotypes
1D-, 2D -Electrophoresis MALDI-TOF LC-MS
GC,LC (-MS)
Sequencing, SNP, AFLP,
Realtime PCR Micro-array
Disease incidence Virulence Drug
sensitivity Biofilm formation Sensory Science
Economy
NIR, FT-IR Raman FlourescenceSerotyping
Data analysis Integrating different types of
bio-data Look for common variation patterns Make
quantitative prediction and forecasting Identify
outliers