Title: Matrix Decomposition and its Application in Statistics
1Matrix Decomposition and its Application in
Statistics
- Nishith Kumar
- Lecturer
- Department of Statistics
- Begum Rokeya University, Rangpur.
- Email nk.bru09_at_gmail.com
2Overview
- Introduction
- LU decomposition
- QR decomposition
- Cholesky decomposition
- Jordan Decomposition
- Spectral decomposition
- Singular value decomposition
- Applications
3Introduction
- This Lecture covers relevant matrix
decompositions, basic numerical methods, its
computation and some of its applications. - Decompositions provide a numerically stable way
to solve a system of linear equations, as shown
already in Wampler, 1970, and to invert
a matrix. Additionally, they provide an important
tool for analyzing the numerical stability of
a system.
- Some of most frequently used decompositions
are the LU, QR, Cholesky, Jordan, Spectral
decomposition and Singular value decompositions.
4Easy to solve system (Cont.)
Some linear system that can be easily solved
The solution
5Easy to solve system (Cont.)
Lower triangular matrix
Solution This system is solved using forward
substitution
6Easy to solve system (Cont.)
Upper Triangular Matrix
Solution This system is solved using Backward
substitution
7LU Decomposition
LU decomposition was originally derived as a
decomposition of quadratic and bilinear forms.
Lagrange, in the very first paper in his
collected works( 1759) derives the algorithm we
call Gaussian elimination. Later Turing
introduced the LU decomposition of a matrix in
1948 that is used to solve the system of linear
equation. Let A be a m m with nonsingular
square matrix. Then there exists two matrices L
and U such that, where L is a lower triangular
matrix and U is an upper triangular matrix.
J-L Lagrange (1736 1813)
A. M. Turing (1912-1954)
8How to decompose ALU?
A ? ?U (upper triangular) ? U Ek ??? E1 A
? A (E1)-1 ??? (Ek)-1 U If each such
elementary matrix Ei is a lower triangular
matrices,it can be proved that (E1)-1, ???,
(Ek)-1 are lower triangular, and(E1)-1 ???
(Ek)-1 is a lower triangular matrix.Let L(E1)-1
??? (Ek)-1 then ALU.
U E2 E1 A
9Calculation of L and U (cont.)
- Now reducing the first column we have
10Calculation of L and U (cont.)
Now
Therefore,
LU
- If A is a Non singular matrix then for each
L (lower triangular matrix) the upper triangular
matrix is unique but an LU decomposition is not
unique. There can be more than one such LU
decomposition for a matrix. Such as
LU
11Calculation of L and U (cont.)
Calculation of L and U (cont.)
- Thus LU decomposition is not unique. Since we
compute LU decomposition by elementary
transformation so if we change L then U will be
changed such that ALU - To find out the unique LU decomposition, it
is necessary to put some restriction on L and U
matrices. For example, we can require the lower
triangular matrix L to be a unit one (i.e. set
all the entries of its main diagonal to ones). - LU Decomposition in R
- library(Matrix)
- xlt-matrix(c(3,2,1, 9,3,4,4,2,5 ),ncol3,nrow3)
- expand(lu(x))
12Calculation of L and U (cont.)
- Note there are also generalizations of LU to
non-square and singular matrices, such as rank
revealing LU factorization. - Pan, C.T. (2000). On the existence and
computation of rank revealing LU factorizations.
Linear Algebra and its Applications, 316
199-222. - Miranian, L. and Gu, M. (2003). Strong rank
revealing LU factorizations. Linear Algebra and
its Applications, 367 1-16. - Uses The LU decomposition is most commonly used
in the solution of systems of simultaneous linear
equations. We can also find determinant easily by
using LU decomposition (Product of the diagonal
element of upper and lower triangular matrix).
13Solving system of linear equation using LU
decomposition
- Suppose we would like to solve a mmÂ
system AX b. Then we can find a
LU-decomposition for A, then to solve AX b, it
is enough to solve the systems - Thus the system LY b can be solved by the
method of forward substitution and the system UX
Y can be solved by the method of - backward substitution. To illustrate, we
give some examples - Consider the given system AX b, where
- and
14Solving system of linear equation using LU
decomposition
- We have seen A LU, where
-
- Thus, to solve AX b, we first solve LY b by
forward substitution - Then
15Solving system of linear equation using LU
decomposition
- Now, we solve UX Y by backward substitution
- then
16QR Decomposition
Firstly QR decomposition originated with
Gram(1883). Later Erhard Schmidt (1907) proved
the QR Decomposition Theorem
Jørgen Pedersen Gram (1850 1916)
Erhard Schmidt (1876-1959)
-
- If A is a mn matrix with linearly
independent columns, then A can be decomposed as
, where Q is a mn matrix
whose columns form an orthonormal basis for the
column space of A and R is an nonsingular upper
triangular matrix. -
17QR-Decomposition (Cont.)
- Theorem If A is a mn matrix with
linearly independent columns, then A can be
decomposed as , where Q is a
mn matrix whose columns form an orthonormal
basis for the column space of A and R is an
nonsingular upper triangular matrix. - Proof Suppose Au1 u2 . . . un and
rank (A) n. - Apply the Gram-Schmidt process to
u1, u2 , . . . ,un and the - orthogonal vectors v1, v2 , . . .
,vn are - Let for i1,2,. .
., n. Thus q1, q2 , . . . ,qn form a orthonormal - basis for
the column space of A.
18QR-Decomposition (Cont.)
- Now,
- i.e.,
- Thus ui is orthogonal to qj for jgti
19QR-Decomposition (Cont.)
- Let Q q1 q2 . . . qn , so Q is a mn
matrix whose columns form an - orthonormal basis for the column space of A
. - Now,
- i.e., AQR.
- Where,
- Thus A can be decomposed as AQR , where R is an
upper triangular and nonsingular matrix.
20QR Decomposition
- Example Find the QR decomposition of
21Calculation of QR Decomposition
- Applying Gram-Schmidt process of computing QR
decomposition - 1st Step
- 2nd Step
- 3rd Step
22Calculation of QR Decomposition
- 4th Step
- 5th Step
- 6th Step
23Calculation of QR Decomposition
- Therefore, AQR
- R code for QR Decomposition
- xlt-matrix(c(1,2,3, 2,5,4, 3,4,9),ncol3,nrow3)
- qrstr lt- qr(x)
- Qlt-qr.Q(qrstr)
- Rlt-qr.R(qrstr)
- Uses QR decomposition is widely used in
computer codes to find the eigenvalues of a
matrix, to solve linear systems, and to find
least squares approximations.
24Least square solution using QR Decomposition
- The least square solution of b is
- Let XQR. Then
- Therefore,
25Cholesky Decomposition
- Cholesky died from wounds received on the
battle field on 31 August 1918 at 5 o'clock in
the morning in the North of France. After his
death one of his fellow officers, Commandant
Benoit, published Cholesky's method of computing
solutions to the normal equations for some least
squares data fitting problems published in the
Bulletin géodesique in 1924. Which is known as
Cholesky Decomposition -
- Cholesky Decomposition If A is a real,
symmetric and positive definite matrix then there
exists a unique lower triangular matrix L with
positive diagonal element such that
.
Andre-Louis Cholesky 1875-1918
26Cholesky Decomposition
- Theorem If A is a nn real, symmetric and
positive definite matrix then there exists a
unique lower triangular matrix G with positive
diagonal element such that . - Proof Since A is a nn real and positive
definite so it has a LU decomposition, ALU. Also
let the lower triangular matrix L to be a unit
one (i.e. set all the entries of its main
diagonal to ones). So in that case LU
decomposition is unique. Let us suppose
observe that
. is a unit upper triangular
matrix. -
- Thus, ALDMT .Since A is Symmetric so, AAT
. i.e., LDMT MDLT. From the uniqueness we have
LM. So, ALDLT . Since A is positive definite so
all diagonal elements of D are positive. Let - then we can write AGGT.
27Cholesky Decomposition (Cont.)
- Procedure To find out the cholesky decomposition
- Suppose
- We need to solve
- the equation
28Example of Cholesky Decomposition
For k from 1 to n
For j from k1 to n
- Suppose
- Then Cholesky Decomposition
- Now,
29R code for Cholesky Decomposition
- xlt-matrix(c(4,2,-2, 2,10,2, -2,2,5),ncol3,nrow3)
- cllt-chol(x)
- If we Decompose A as LDLT then
- and
30Application of Cholesky Decomposition
- Cholesky Decomposition is used to solve the
system of linear equation Axb, where A is real
symmetric and positive definite. - In regression analysis it could be used to
estimate the parameter if XTX is positive
definite. - In Kernel principal component analysis,
Cholesky decomposition is also used (Weiya Shi Â
Yue-Fei Guo 2010)
31Characteristic Roots and Characteristics Vectors
- Any nonzero vector x is said to be a
characteristic vector of a matrix A, If there
exist a number ? such that Ax ?x - Where A is a square matrix, also then ? is
said to be a characteristic root of the matrix A
corresponding to the characteristic vector x. -
- Characteristic root is unique but
characteristic vector is not unique. - We calculate characteristics root ? from the
characteristic equation A- ?I0 - For ? ?i the characteristics vector is the
solution of x from the following homogeneous
system of linear equation (A- ?iI)x0
Theorem If A is a real symmetric matrix and ?i
and ?j are two distinct latent root of A then the
corresponding latent vector xi and xj are
orthogonal.
32Multiplicity
- Algebraic Multiplicity The number of
repetitions of a certain eigenvalue. If, for a
certain matrix, ?3,3,4, then the algebraic
multiplicity of 3 would be 2 (as it appears
twice) and the algebraic multiplicity of 4 would
be 1 (as it appears once). This type of
multiplicity is normally represented by the Greek
letter a, where a(?i) represents the algebraic
multiplicity of ?i. - Geometric Multiplicity the geometric
multiplicity of an eigenvalue is the number of
linearly independent eigenvectors associated with
it.
33Jordan Decomposition Camille Jordan (1870)
- Let A be any nn matrix then there exists a
nonsingular matrix P and JK(?) a kk matrix form - Such that
Camille Jordan (1838-1921)
where k1k2 kr n. Also ?i , i1,2,. . ., r
are the characteristic roots And ki are the
algebraic multiplicity of ?i ,
Jordan Decomposition is used in Differential
equation and time series analysis.
34Spectral Decomposition
A. L. Cauchy established the Spectral
Decomposition in 1829.
-
- Let A be a m m real symmetric matrix. Then
there exists an orthogonal matrix P such that - or , where
? is a diagonal matrix.
CAUCHY, A.L.(1789-1857)
35Spectral Decomposition and Principal component
Analysis (Cont.)
- By using spectral decomposition we can write
- In multivariate analysis our data is a matrix.
Suppose our data is X matrix. Suppose X is mean
centered i.e., - and the variance covariance matrix is ?. The
variance covariance matrix ? is real and
symmetric. - Using spectral decomposition we can write ?P?PT
. Where ? is a diagonal matrix. - Also
- tr(?) Total variation of Data tr(?)
36Spectral Decomposition and Principal component
Analysis (Cont.)
- The Principal component transformation is the
transformation - Y(X-µ)P
- Where,
- E(Yi)0
- V(Yi)?i
- Cov(Yi ,Yj)0 if i ? j
- V(Y1) V(Y2) . . . V(Yn)
-
-
37R code for Spectral Decomposition
- xlt-matrix(c(1,2,3, 2,5,4, 3,4,9),ncol3,nrow3)
- eigen(x)
- Application
- For Data Reduction.
- Image Processing and Compression.
- K-Selection for K-means clustering
- Multivariate Outliers Detection
- Noise Filtering
- Trend detection in the observations.
38Historical background of SVD
- There are five mathematicians who were
responsible for establishing the existence of the
- singular value decomposition and developing its
theory.
Camille Jordan (1838-1921)
James Joseph Sylvester (1814-1897)
Erhard Schmidt (1876-1959)
Hermann Weyl (1885-1955)
Eugenio Beltrami (1835-1899)
The Singular Value Decomposition was originally
developed by two mathematician in the mid to
late 1800s 1. Eugenio Beltrami , 2.Camille
Jordan Several other mathematicians took part in
the final developments of the SVD including James
Joseph Sylvester, Erhard Schmidt and Hermann
Weyl who studied the SVD into the
mid-1900s. C.Eckart and G. Young prove low rank
approximation of SVD (1936).
C.Eckart
39What is SVD?
- Any real (mn) matrix X, where (n m), can be
- decomposed,
- X U?VT
- U is a (mn) column orthonormal matrix (UTUI),
containing the eigenvectors of the symmetric
matrix XXT. - ? is a (nn ) diagonal matrix, containing the
singular values of matrix X. The number of non
zero diagonal elements of ? corresponds to the
rank of X. - VT is a (nn ) row orthonormal matrix (VTVI),
containing the eigenvectors of the symmetric
matrix XTX.
40Singular Value Decomposition (Cont.)
- Theorem (Singular Value Decomposition) Let X be
mn of rank r, r n m. Then there exist
matrices U , V and a diagonal matrix ? , with
positive diagonal elements such that, - Proof Since X is m n of rank r, r n m. So
XXT and XTX both of rank r ( by using the concept
of Grammian matrix ) and of dimension m m and n
n respectively. Since XXT is real symmetric
matrix so we can write by spectral decomposition,
- Where Q and D are respectively, the matrices
of characteristic vectors and corresponding
characteristic roots of XXT. - Again since XTX is real symmetric matrix so
we can write by spectral decomposition,
41Singular Value Decomposition (Cont.)
- Where R is the (orthogonal) matrix of
characteristic vectors and M is diagonal matrix
of the corresponding characteristic roots. - Since XXT and XTX are both of rank r, only r of
their characteristic roots are positive, the
remaining being zero. Hence we can write, - Also we can write,
42Singular Value Decomposition (Cont.)
- We know that the nonzero characteristic roots of
XXT and XTX are equal so - Partition Q, R conformably with D and M,
respectively - i.e., such
that Qr is m r , Rr is n r and correspond
respectively to the nonzero characteristic roots
of XXT and XTX. Now take - Where are the positive
characteristic roots of XXT and hence those of
XTX as well (by using the concept of grammian
matrix.)
43Singular Value Decomposition (Cont.)
- Now define,
- Now we shall show that SX thus completing the
proof. - Similarly,
- From the first relation above we conclude that
for an arbitrary orthogonal matrix, say P1 , - While from the second we conclude that for an
arbitrary orthogonal matrix, say P2 - We must have
44Singular Value Decomposition (Cont.)
- The preceding, however, implies that for
arbitrary orthogonal matrices P1 , P2 the matrix
X satisfies - Which in turn implies that,
- Thus
45R Code for Singular Value Decomposition
- xlt-matrix(c(1,2,3, 2,5,4, 3,4,9),ncol3,nrow3)
- svlt-svd(x)
- Dlt-svd
- Ult-svu
- Vlt-svv
46Decomposition in Diagram
Matrix A
Full column rank
Lu decomposition Not always unique
QR Decomposition
Rectangular
Square
Asymmetric
SVD
Symmetric
AMGM
AMgtGM
PD
Similar Diagonalization P-1AP?
Jordan Decomposition
Cholesky Decomposition
Spectral Decomposition
47Properties Of SVD
- Rewriting the SVD
- where
- r rank of A
- ?i the i-th diagonal element of
?. - ui and vi are the i-th columns of U
and V respectively.
48Proprieties of SVDLow rank Approximation
- Theorem If AU?VT is the SVD of A and the
- singular values are sorted as
, - then for any l ltr, the best rank-l approximation
- to A is
-
- Low rank approximation technique is very much
- important for data compression.
49Low-rank Approximation
- SVD can be used to compute optimal low-rank
approximations. - Approximation of A is à of rank k such that
- If are the characteristics
roots of ATA then - Ã and X are both m?n matrices.
Frobenius norm
50Low-rank Approximation
set smallest r-k singular values to zero
K2
51Approximation error
- How good (bad) is this approximation?
- Its the best possible, measured by the Frobenius
norm of the error - where the ?i are ordered such that ?i ? ?i1.
Now
52 Row approximation and column approximation
- Suppose Ri and cj represent the i-th row and
j-th column of A. The SVD - of A and is
-
- The SVD equation for Ri is
-
- We can approximate Ri by
lltr - where i 1,,m.
Also the SVD equation for Cj is, where j 1, 2,
, n
We can also approximate Cj by
lltr
53 Least square solution in inconsistent system
- By using SVD we can solve the inconsistent
system.This gives the least square solution. -
- The least square solution
- where Ag be the MP inverse of A.
54- The SVD of Ag is
- This can be written as
- Where
55Basic Results of SVD
56SVD based PCA
- If we reduced variable by using SVD then it
performs like PCA. - Suppose X is a mean centered data matrix, Then
- X using SVD, XU?VT
-
- we can write- XV U?
- Suppose Y XV U?
- Then the first columns of Y represents the first
- principal component score and so on.
- SVD Based PC is more Numerically Stable.
- If no. of variables is greater than no. of
observations then SVD based PCA will give
efficient result(Antti Niemistö, Statistical
Analysis of Gene Expression Microarray Data,2005)
57Application of SVD
- Data Reduction both variables and observations.
- Solving linear least square Problems
- Image Processing and Compression.
- K-Selection for K-means clustering
- Multivariate Outliers Detection
- Noise Filtering
- Trend detection in the observations and the
variables.
58Origin of biplot
- Gabriel (1971)
- One of the most important advances in data
analysis in recent decades - Currently
- gt 50,000 web pages
- Numerous academic publications
- Included in most statistical analysis packages
- Still a very new technique to most scientists
59What is a biplot?
- Biplot bi plot
- plot
- scatter plot of two rows OR of two columns, or
- scatter plot summarizing the rows OR the columns
- bi
- BOTH rows AND columns
- 1 biplot gtgt 2 plots
60Practical definition of a biplotAny two-way
table can be analyzed using a 2D-biplot as soon
as it can be sufficiently approximated by a
rank-2 matrix. (Gabriel, 1971)
(Now 3D-biplots are also possible)
Matrix decomposition
E1
G1
G2
P(4, 3) G(3, 2) E(2, 3)
E2
G4
E3
G3
G-by-E table
61Singular Value Decomposition (SVD) Singular
Value Partitioning (SVP)
SVD
Common uses value of f
f1
SVP
f0
f1/2
Rows scores
Column scores
62Biplot
- The simplest biplot is to show the first two PCs
together with the projections of the axes of the
original variables - x-axis represents the scores for the first
principal component - Y-axis the scores for the second principal
component. - The original variables are represented by arrows
which graphically indicate the proportion of the
original variance explained by the first two
principal components. - The direction of the arrows indicates the
relative loadings on the first and second
principal components. - Biplot analysis can help to understand the
multivariate data - i) Graphically
- ii) Effectively
- iii) Conveniently.
63Biplot of Iris Data
1 Setosa 2 Versicolor 3 Virginica
64Image Compression Example
- Pansy Flower image, collected from
- http//www.ats.ucla.edu/stat/r/code/pansy.jpg
- This image is 600465 pixels
65Singular values of flowers image
- Plot of the singular values
66Low rank Approximation to flowers image
Rank- 5 approximation
67Low rank Approximation to flowers image
Rank-30 approximation
68Low rank Approximation to flowers image
Rank-80 approximation
69Low rank Approximation to flowers image
Rank-120 approximation
70Low rank Approximation to flowers image
True Image
71Outlier Detection Using SVD
- Nishith and Nasser (2007,MSc. Thesis) propose
a graphical method of outliers detection using
SVD. - It is suitable for both general multivariate
data and regression data. For this we construct
the scatter plots of first two PCs, and first PC
and third PC. We also make a box in the scatter
plot whose range lies - median(1stPC) 3 mad(1stPC) in the X-axis
and median(2ndPC/3rdPC) 3 mad(2ndPC/3rdPC) in
the Y-axis. - Where mad median absolute deviation.
- The points that are outside the box can be
considered as extreme outliers. The points
outside one side of the box is termed as
outliers. Along with the box we may construct
another smaller box bounded by 2.5/2 MAD line
72Outlier Detection Using SVD (Cont.)
HAWKINS-BRADU-KASS (1984) DATA
Data set containing 75 observations with 14
influential observations. Among them there are
ten high leverage outliers (cases 1-10) and for
high leverage points (cases 11-14) -Imon (2005).
Scatter plot of Hawkins, Bradu and kass data (a)
scatter plot of first two PCs and (b) scatter
plot of first and third PC.
73Outlier Detection Using SVD (Cont.)
MODIFIED BROWN DATA
Data set given by Brown (1980).
Ryan (1997) pointed out that the original data
on the 53 patients which contains 1 outlier
(observation number 24).
Imon and Hadi(2005) modified this data set by
putting two more outliers as cases 54 and 55.
Also they showed that observations 24, 54 and
55 are outliers by using generalized
standardized Pearson residual (GSPR)
Scatter plot of modified Brown data (a) scatter
plot of first two PCs and (b) scatter plot of
first and third PC.
74Cluster Detection Using SVD
- Singular Value Decomposition is also used for
cluster detection (Nishith, Nasser and Suboron,
2011). -
- The methods for clustering data using first
three PCs are given below, - median (1st PC) k mad (1st PC) in the
X-axis and median (2nd PC/3rd PC) k mad (2nd
PC/3rd PC) in the Y-axis. - Where mad median absolute deviation. The
value of k 1, 2, 3.
75(No Transcript)
76Principals stations in climate data
77Climatic Variables
- The climatic variables are,
- Rainfall (RF) mm
- Daily mean temperature (T-MEAN)0C
- Maximum temperature (T-MAX)0C
- Minimum temperature (T-MIN)0C
- Day-time temperature (T-DAY)0C
- Night-time temperature (T-NIGHT)0C
- Daily mean water vapor pressure (VP) MBAR
- Daily mean wind speed (WS) m/sec
- Hours of bright sunshine as percentage of maximum
possible sunshine hours (MPS) - Solar radiation (SR) cal/cm2/day
78Consequences of SVD
-
- Generally many missing values may present in the
data. It may also contain - unusual observations. Both types of problem can
not handle Classical singular - value decomposition.
- Robust singular value decomposition can
solve both types of problems. - Robust singular value decomposition can be
obtained by alternating L1 regression approach
(Douglas M. Hawkins, Li Liu, and S. Stanley
Young, (2001)).
79The Alternating L1 Regression Algorithm for
Robust Singular Value Decomposition.
There is no obvious choice of the initial values
of
- Initialize the leading
- left singular vector
Fit the L1 regression coefficient cj by
minimizing
j1,2,,p
Calculate right singular vector v1c/c , where
. refers to Euclidean norm.
Again fit the L1 regression coefficient di by
minimizing i1,2,.,n
Calculate the resulting estimate of the left
eigenvector uid/ d
Iterate this process untill it converge.
For the second and subsequent of the SVD, we
replaced X by a deflated matrix obtained by
subtracting the most recently found them in the
SVD X X-?kukvkT
80Clustering weather stations on MapUsing RSVD
81References
- Brown B.W., Jr. (1980). Prediction analysis for
binary data. in Biostatistics Casebook, R.G.
Miller, Jr., B. Efron, B. W. Brown, Jr., L.E.
Moses (Eds.), New York Wiley. - Dhrymes, Phoebus J. (1984), Mathematics for
Econometrics, 2nd ed. Springer Verlag, New York. - Hawkins D. M., Bradu D. and Kass
G.V.(1984),Location of several outliers in
multiple regression data using elemental sets.
Technometrics, 20, 197-208. - Imon A. H. M. R. (2005). Identifying multiple
influential observations in linear Regression.
Journal of Applied Statistics 32, 73 90. - Kumar, N. , Nasser, M., and Sarker, S.C., 2011.
A New Singular Value Decomposition Based Robust
Graphical Clustering Technique and Its
Application in Climatic Data Journal of
Geography and Geology, Canadian Center of Science
and Education , Vol-3, No. 1, 227-238. - Ryan T.P. (1997). Modern Regression Methods,
Wiley, New York. - Stewart, G.W. (1998). Matrix Algorithms, Vol 1.
Basic Decompositions, Siam, Philadelphia. - Matrix Decomposition. http//fedc.wiwi.hu-berlin.d
e/xplore/ebooks/html/csa/node36.html
82(No Transcript)