Title: Empirical Orthogonal Functions EOFs
1Empirical Orthogonal Functions (EOFs) (closely
related to Principal Components, and Factor
Analysis) Identifying preferred patterns within
many variables
2 EOF analysis finds themes in a data
set. The ups and downs of some of the data go
along with the ups and downs (or, the downs and
ups) of some of the rest of the data.
When
a persons
anger is up, his/her
blood
pressure is up,
concentration ability
is down, energy level
is up, sensitivity
to
the feelings of others
is down,
etc. These
all tend to happen
together, and a time
series of data would
show it. The anger
syndrome would be
a cluster, or a theme, or an EOF mode, in a
persons continuous, moment-to-moment
physiological data set.
3EOFs work without any prior know- ledge of
relationships linking the observations or
underlying physical processes. It expresses the
data in a smaller set of new variables defined
through a linear combination of the original
ones. The desired result is a limited collection
of patterns, called EOF modes, that are
sufficient to reconstruct a good approximation
of the original data and also easy to visualize
and recognize. Although such modes sometimes
represent known physically phenomena, they are
not designed to isolate only physical
mechanisms. EOF should be always thought of only
as an efficient statistical compression tool.
Four clusters, each having two elements
4Facts relevant to a crime - clustering
Victim was having affair with her piano
teacher
Suspect has committed violent crime before
Suspect knows and likes piano teacher
Suspect is poor, needs money
Suspect has gun hobby, owns many guns
Victim is suspects x-wife, is due 3 more
years of alimony from him
Victims piano teacher seen near time and place
of crime
Suspect just lost his job
Health sciences often deal with data that have
high dimensionality such as collections of
spatially distributed time series like the
malaria observations or related climate
observations. Because such observations are not
entirely random, but are often related to
each other, the information contained in such
datasets can often be compressed down to a few
spatial patterns that cluster districts/ locations
that are strongly related. EOF is an exploratory
analysis technique for performing such a
compression in an objective way.
5Suppose we have a time record of data for a field
variable, such as malaria at many locations
throughout a country. For example Monthly
malaria prevalence data for a 10-year
period across a country, at each of 31 districts,
for monthly data covering a 10-year period.
Suppose we analyze just one month of the year,
such as June, for the 10 years. Malaria data
(June) for a 10-year period over 31 districts
Time 1 district1 district2..district31 Time
2 district1 district2..district31 .. .
. .. .. .. .. .. ..
.. Time 10 district1 district2..distr
ict31
6Showing the possible data for the malaria
incidence example
variables (district )
d1 d2 d3 d4
d31 1997 0.054
0.029 0.106 0.002 .. 0.095 1998
0.058 0.052 0.171 0.022 ..
0.130 sampling 1999
0.033 0.013 0.050 0.003 .. 0.049
dimension . . .
. . .. . (year) .
. . .
. .. . .
. . . . .. .
. . . .
. .. . .
. . . . .. .
. . .
. . .. . 2006
0.008 0.005 0.014 0.000 .. 0.012
7- How EOF modes are defined from a dataset.
- First, a complete intercorrelation matrix is
computed - Dis-
- trict D i s t r i c
t -
- 1 2 .. 31
- -------------------------------------------------
-- - 1 1.00 0.71 .. -0.13
-
- 2 0.71 1.00 .. 0.07
- ..
- ..
- 31 -0.13 0.07.... 1.00
8Then, using the cross-correlation matrix, a
procedure is used to identify which districts
best form a coherent clusterpoints that vary
similarly or oppositely from one another most
strongly. This leads to the formation of a
linear comb- ination of all the districts. In
this combination, each districts value will be
assigned a weight (positive or negative),
something like the weights assigned to the
predictors in multiple regression. The
pattern of these weights often shows up,
visually, as a coherent (non-random) pattern in
the spatial domain. Such a pattern of weights is
an EOF loading pattern (technically, it
is called an eigenvector). The weights may form a
simple picture.
center of action
non-participating areas
Malaria in Eritrea Spatial loading pattern
9By multiplying the values at the grid points for
one particular time by their loading weights, and
adding them all up, we get the amplitude (or
temporal score) for that time. Times
whose original data assume that pattern have high
( or -) scores. This gives us the time series
to go along with the spatial pattern.
Time when pattern was similar to that shown in
spatial loading pattern
0
Amplitude
Time ? Time series for the spatial
loading pattern of
malaria in Eritrea
shown in previous slide
Time when pattern was strongly oppo- site to that
shown in spatial loading pattern
10Example of an EOF analysis
Suppose we give 14 people exams in
various subjects. Given the scores the 14 people,
for 6 subjects, how can we best summarize
them? One objective of summarizing the scores
would be to distinguish the good students from
the bad students. EOFs are ideal for obtaining a
summary of this, but they can also provide other
informative summaries
11Input Data for EOFs exam scores
Each subject is a variable (like one grid point),
each person is a case,
or sample (like one year).
V A R I A B
L E S
(Not needed)
S A M P L E S
12Compute the intercorrelation matrix among
disciplines, for the
14-person sample
13Loading weights (left), amplitudes (right)
EOF mode 1 positive loadings on all exams.
This distinguishes good from bad students
(amplitudes are shown at right good students
have positive scores)
14Loading weights (left), amplitudes (right)
EOF mode 2 oppositely signed loadings on
physical vs. social sciences. This distinguishes
physical scientists from social scientists
(amplitudes are shown at right physical
scientists have positive score).
15EOF analysis is performed by inputting the
correlation matrix to a procedure called
eigenvalue/eigenvector analysis. It involves
solving a large set of linear equations. Grid
points having high correlations with the most
other grid points ( or -) participate most
strongly. Each EOF pattern that emerges explains
a certain percent- age of the total variance of
all the grid points over time. This percentage of
variance explained is maximized. The first EOF
mode gathers the most variance, and then the
second EOF mode works on what remains after
all the variability associated with the first
mode is removed. Mode-1 Mode-2
Mode-3 Mode-4 Mode-5 Mode-6
16 Mode 1 for upper atmos- pheric height in
northern winter
spatial loading pattern
ENSO Signature El Nino El
Nino strong El Nino time
ENSO and series
climate
change (amplitude)
La Nina are seen
together
in this mode
Often, after 1 to 4 modes have been defined, the
coherent portion of the total variability is
exhausted, and further modes just work on the
remaining incoherent noise. When this happens,
the loading patterns start looking random and
physically meaningless, and the amounts
of additional variance explained become small.
17If the trend in this mode is not of interest, it
may get in the way of the analysis of the
shorter-term variations of concern.
Sometimes EOF analysis can isolate the trend in
one mode, so it is absent in all other modes. But
in this example the climate change trend is
mixed with variability related to ENSO.
It is
possible
to remove
trends prior
do performing
EOF analysis,
or any analysis.
Methods to
remove trends
will be discussed
in another
session.
upward trend climate change signature
18 EXAMPLE Use of
EOFs for 8 Years of Monthly Malaria Incidence
Incidence Data in Eritrea
Data Monthly Incidence (Jan 1996 Dec. 2003)
(96 months) 58 subzobas of Eritrea
Two Analysis Designs
(1) Malaria Climatology (2) Interannual
Variability with Respect to the Historical Means
for the Months. (Monthly means will be
subtracted out.)
19Analysis Design 1
Malaria Incidence Annual Cycle
Climatology EOF 1 loadings 84.8
variance EOF1 Amplitude
Dominant mode shows highest incidence
in southwestern parts, in Sep, Oct and Nov
0
20Analysis Design 1
Malaria Incidence Annual Cycle
Climatology EOF 2 loadings 8.2
variance EOF2 Amplitude
Secondary mode shows high incidence in various
parts, in Jan and Feb
0
21Analysis Design 1 Annual Cycle
Malaria Incidence, Annual
Cycle Percent Variance Explained, by EOF mode
number
Mode number Note
max number of modes
is the
lesser of (1) number of
times, or (2) number of zobas
22Analysis Design 2
Malaria Incidence, Interannual
Departures From Monthly Climatology
Interannual Var EOF 1 loadings 53.7
variance EOF1 Amplitude
Dominant mode shows strong positive anomaly in
western parts, in Oct 1998 and Jan 1998.
Note that since no
standard- ization
is done on
anomalies, most amp-
litude oc- cures in
Oct or Jan
0
23Analysis Design 2
Malaria Incidence, Interannual
Departures From Monthly Climatology
Interannual Var EOF 2 loadings 14.2
variance EOF2 Amplitude
Secondary mode shows positive anomaly in
central and northwestern parts, in 1996 and
Dec-Jan-Feb 1997/98
0
24Analysis Design 2 Interannual Departures
Malaria Incidence, Interannual
Variability Percent Variance Explained, by EOF
mode number
Mode number Note
max number of modes
is the
lesser of (1) number of
times, or (2) number of zobas
25 Introduction to Treatment of Trends
Botswana
Which variations are of interest
to us, and which are interfering with our view
of those?
trend
Variations in malaria incidence may come from
Changes in
surveillance and detection practices (steps,
trends) step Changes in control policies
(steps or trends) Climate (temp, precip) for
each year (year-to-year variations) ?