Computational tools for fitting PARAFAC models - PowerPoint PPT Presentation

About This Presentation
Title:

Computational tools for fitting PARAFAC models

Description:

... Century Schoolbook Futura Bk BT Wingdings Script MT Bold Euclid Tahoma Symbol Euclid Math One Wingdings 2 PhD Defence_Blu MathType 5.0 Equation Slide ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 31
Provided by: Giorgio48
Learn more at: http://www.ece.umn.edu
Category:

less

Transcript and Presenter's Notes

Title: Computational tools for fitting PARAFAC models


1
(No Transcript)
2
PARAFAC equation / 1
  • PARAFAC is an N -linear model for an N-way array
  • For an array X, it is
    defined as
  • denotes the array elements
  • are the model parameters
  • F is the number of fitted components
  • denotes the residuals
  • The model parameters are typically grouped in N
    loading matrices

3
HPLC-DAD
  • HPLC combined with Diode Array Detection
  • Light absorbed at i-th wavelength
  • Light absorbed at j-th time and i-th wavelength
  • For F compounds and K samples

4
Matricisation / 1
  • Matricisation is an operation that associates a
    matrix to a multi-way array
  • The number of possible matricisations increases
    with the arrays order
  • Notation
  • Some matricisations are faster than others
  • A shiftdim operation can be implemented more
    rapidly using appropriate matricisations

5
Matricisation / 1
  • Matricisation is an operation that associates a
    matrix to a multi-way array
  • The number of possible matricisations increases
    with the arrays order
  • Notation
  • Some matricisations are faster than others
  • A shiftdim operation can be implemented more
    rapidly using appropriate matricisations

6
Vectorisation
  • The vec operator transforms a matrix in a vector
  • In combination with matricisation one can define
    the vectorisation operation for N-way arrays
  • The result of the vectorisation depends only the
    order of the modes in resulting from
    matricisation

7
Matricisation / 2
  • The order of the modes is often taken as a
    convention
  • Row/Column modes in increasing/decreasing order
  • Row/Column modes in cyclycal order
  • Subscripts n, nn', or n,n',? indicate the
    modes that are removed
  • Subscripts n, nn', or n,n',? for a matricised
    array indicate the modes in the rows

8
Commutation matrices
  • For an n ?p matrix X, the commutation matrix Knp
    performs the operation
  • For an I1 ? ? IN array ?, the N-way commutation
    matrices Mn and Mnn' perform the operations
  • Commutation matrices can be used to shift through
    matricisations
  • With cyclic modes notation shiftdim does not
    require commutation matrices

9
The Khatri-Rao product
  • For A and B with the same number of columns The
    column-wise Khatri-Rao product ? performs the
    operation

10
PARAFAC equation / 2
  • The matrix equation for PARAFAC is
  • The vector representation of the PARAFAC model
    array is
  • The notation is simplified using the letter Z for
    the Khatri-Rao products
  • Different matricisations/vectorisation
    corresponds to permutations of factors in the
    Khatri-Rao product

11
Fitting the PARAFAC model
  • Fitting the PARAFAC model in the least squares
    sense corresponds to solving the nonlinear
    problem
  • A weighted least squares fitting criterion takes
    the form
  • where Dw is a (positive semidefinite) diagonal
    matrix holding the elements of w vecW1
  • If a the residuals variance/covariance S matrix
    is known

12
Algorithms for PARAFAC
  • Many algorithms have been proposed to fitting
    PARAFAC models
  • Alternating Least Squares (1970)
  • Gauss-Newton (1982)
  • Preconditioned Conjugate Gradients (1995/1999)
  • Levenberg-Marquardt (1997)
  • Direct Trilinear Decomposition (1990)
  • Alternating Trilinear Decomposition (1998)
  • Alternating Slice-wise Decomposition (2000)
  • Self-Weighted Alternating TriLinear Decomposition
    (2000)
  • Pseudo-Alternating Least Squares (2001)
  • PARAFAC with Penalty Diagonalization Error (2001)

13
Alternating Least Squares
  • ALS breaks down the nonlinear problem in linear
    ones, which are solved iteratively
  • Initial values for N-1 loading matrices must be
    provided
  • The properties of the Moore-Penrose inverse and
    those of the Khatri-Rao product are used to
    reduce the computational load
  • Convergence is checked at each step using (among
    others) the relative fit decrease

14
PARAFAC-ALS Revisited
  • Using matricisations, rearrangements can be
    avoided or largely reduced
  • The computation load can be reduced by
  • a factor I1F 1 for a 3-way array for modes 2 and
    3.
  • a factor InIn1F 1 for 4-way arrays and higher
    every two treated modes (n and n 1)
  • Operating column-wise the number of operations is
    reduced by a factor F
  • The loss function can be calculated without
    explicitly calculating the residuals

15
Line search extrapolation
  • Line search extrapolation is used to accelerate
    convergence in ALS
  • An analytical solution to the exact line search
    problem for PARAFAC
  • The optimal step length is found as the real root
    of a polynomial of degree 2N.
  • The cost for computing the polynomial
    coefficients directly is
  • A great reduction in the number of iterations is
    obtained with simple and exact line search

16
Line search extrapolation
  • The computation time for is higher with exact
    line search
  • The problem seems to be in the search direction
    and not only the higher computation load per
    iteration
  • The algorithm in its fastest implementation seems
    to suffer from numerical instability
  • Several possibilities may prove beneficial
  • Perform line search only when the updates become
    highly collinear
  • Set the direction of search as the combination of
    several consecutive updates

17
Self-Weighted Alternating TriLinear Decomposition
  • Does not find the least squares solution, but
    minimises at each step a modified loss function
  • Not straightforwardly extendible to higher orders
  • Requires full column rank for all loading
    matrices
  • The scaling convention affects the convergence
  • Similar cost as PARAFAC-ALS

18
SWATLD
  • SWATLD fitting criterion and convergence
    properties are not well characterised
  • SWATLD yields biased loadings, which affects
    predictions
  • SWATLD yields solutions with higher core
    consistency
  • The results suggest that introducing such bias
    may be beneficial
  • Naïve solutions (PARAFAC-PDE) lead to unstable
    algorithms

19
Levenberg-Marquardt
  • Based on a local linearisation of the vectorised
    residuals (r) in the neighbourhood of the interim
    solution
  • J is the Jacobian matrix of the vector of the
    residuals
  • and in matrix form it is expressed as
  • An update to the solution is found by solving the
    problem

20
Jacobian J
  • J is very sparse, with density
  • J is rank deficient because of the scaling
    indeterminacy
  • J is very tall and thin and cannot be stored as
    full apart from small problems
  • Sparse QR methods are unfeasible in most cases
  • The problem is solved using with system of normal
    equations

21
JTJ and JTDwJ
  • Both can be calculated without forming J
  • WLS case is much more expensive because of the
    calculation of U and V
  • Time expense can be reduced using property e. and
    c. of the Khatri-Rao product
  • Filling the sparse J and compute JTJ explicitly
    is faster for some WLS problems

22
Gradient JTr
  • Residuals are not necessary for LS fitting
    criterion
  • Faster routines based on the chain rule for
    matrix functions can be obtained using property
    e. of KR product
  • Complexity is identical to an ALS step

23
Time consumption
24
PARAFAC-LM
  • The size of the problem is
  • The cost per iteration is in the order of
  • The method is too expensive for large problems

25
A comparison of algorithms
  • SWATLD and PARAFAC-LM are more resistant to mild
    model overfactoring
  • SWATLD did not yield 2 factor degeneracies in
    simulated sets.
  • PARAFAC-LM performs better for ill-conditioned
    problems
  • PARAFAC-LM is unfeasible for larger problems
  • SWATLD is faster than ALS and LM and relatively
    robust with respect to high collinearity
  • PARAFAC-LM is preferable for higher order arrays
    and if rank is relatively small

26
Compression
  • The array is projected on some truncated bases
  • SVD based compressions
  • Tucker based compressions
  • Prior knowledge (CANDELINC, PARAFAC-IV)
  • The array is compressed to F N
  • Not compatible with non-negativity constraints

27
QR compression / preconditioning
  • Calculate a QR decomposition of each loading
    matrix
  • AnQnUn
  • with Un upper triangular
  • Premultiply x with?QN ? ? ? Q1
  • J becomes extremely sparse
  • Many data elements can be skipped
  • QR compression is lossless, but the compression
    rate is lower than for Tucker based compression

28
Missing values
  • Several patterns of missing values
  • Randomly Missing Values (RMV)
  • Randomly Missing Spectra (RMS)
  • Systematically Missing Spectra (SMS)
  • Two approaches
  • Weighted Least Squares (INDAFAC)
  • Single Imputation (ALS with Expectation
    Maximisation)
  • The conditioning of the problem is influenced by
    the
  • fraction of missing values
  • pattern of the missing values in the array

29
Missing values some results
  • Different patterns of missing values yield
    different artefacts
  • The quality of the predictions depends on the
    pattern
  • With RMV good predictions are possible also with
    70 m.v.
  • Quality of the loadings varies with the asymmetry
    of the pattern
  • SMS pattern interacts with multilinearity
  • INDAFAC grows faster with the of missing values
    (RMV/SMS)
  • INDAFAC is faster than ALS for the SMS pattern

30
Final remarks
  • There appears to be no method superior to any
    other in all conditions
  • There is great need for numerical insight in the
    algorithms. Faster algorithms may entail
    numerical instability
  • Several properties of the column-wise Khatri-Rao
    product can be used to reduce the computation
    load
  • Numerous methods have not been investigated yet
Write a Comment
User Comments (0)
About PowerShow.com