Michael Elad - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Michael Elad

Description:

Sparse & Redundant Representations by Iterated-Shrinkage Algorithms * Michael Elad The Computer Science Department The Technion Israel Institute of technology – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 46
Provided by: Michael3416
Category:

less

Transcript and Presenter's Notes

Title: Michael Elad


1
Sparse Redundant Representations by
Iterated-Shrinkage Algorithms
  • Michael Elad
  • The Computer Science Department
  • The Technion Israel Institute of
    technology
  • Haifa 32000, Israel

2
Todays Talk is About
the Minimization of the Function by
Iterated-Shrinkage Algorithms
  • Why This Topic?
  • Key for turning theory to applications in
    Sparse-land,
  • Shrinkage is intimately coupled with wavelet,
  • The applications we target fundamental signal
    processing ones.
  • This is a hot topic in harmonic analysis.
  • Today we will discuss
  • Why this minimization task is important?
  • Which applications could benefit from this
    minimization?
  • How can it be minimized effectively?
  • What iterated-shrinkage methods are out there?
    and

3
Agenda
  • Motivating the Minimization of f(a)
  • Describing various applications that need this
    minimization
  • Some Motivating Facts
  • General purpose optimization tools, and the
    unitary case
  • Iterated-Shrinkage Algorithms
  • We describe five versions of those in detail
  • Some Results
  • Image deblurring results
  • Conclusions

Why?
4
Lets Start with Image Denoising
?
Remove Additive Noise
Many of the existing image denoising algorithms
are related to the minimization of an energy
function of the form
y Given measurements x Unknown to be
recovered
We will use a Sparse Redundant Representation
prior.
5
Our MAP Energy Function
  • We assume that x is created by M
    where ? is a
    sparse redundant
    representation and D is a known
    dictionary.
  • This leads to
  • This MAP denoising algorithm is known as basis
    Pursuit Denoising Chen, Donoho, Saunders
    1995.
  • The term ?(a) measures the sparsity of the
    solution ?
  • L0-norm (a0) leads to non-smooth non-convex
    problem.
  • The Lp norm ( ) with 0ltp1 is often found to
    be equivalent.
  • Many other ADDITIVE sparsity measures are
    possible.

6
General (linear) Inverse Problems
  • Assume that x is known to emerge from M, as
    before.
  • Suppose we observe , a blurred
    and noisy version of x. How could we recover x?
  • A MAP estimator leads to

7
Inverse Problems of Interest
  • De-Noising
  • De-Blurring
  • In-Painting
  • De-Mosaicing
  • Tomography
  • Image Scale-Up
    super-resolution
  • And more

8
Signal Separation
  • Given a mixture zx1x2v two sources, M1 and
    M2 , and white Gaussian noise v, we desire to
    separate it to its ingredients.
  • Written differently
  • Thus, solving this problem using MAP leads to the
    Morphological Component Analysis (MCA) Starck,
    Elad, Donoho, 2005

M1?
v
z
9
Compressed-Sensing Candes et.al. 2006,
Donoho, 2006
  • In compressed-sensing we compress the signal x by
    exploiting its origin. This is done
    by pltltn random projections.
  • The core idea (P size
    pn) holds all the information about the original
    signal x, even though pltltn.
  • Reconstruction? Use MAP again and solve

10
Brief Summary 1
The minimization of the function is a worthy
task,
serving many various applications.
So, How This Should be Done?
11
Agenda
  • Motivating the Minimization of f(a)
  • Describing various applications that need this
    minimization
  • Some Motivating facts
  • General purpose optimization tools, and the
    unitary case
  • Iterated-Shrinkage Algorithms
  • We describe five versions of those in detail
  • Some Results
  • Image deblurring results
  • Conclusions

12
Is there a Problem?
  • The first thought With all the existing
    knowledge in optimization, we could find a
    solution.
  • Methods to consider
  • (Normalized) Steepest Descent compute the
    gradient and follow its path.
  • Conjugate Gradient use the gradient and the
    previous update direction, combined by a preset
    formula.
  • Pre-Conditioned SD weight the gradient by the
    Hessians diagonal inverse.
  • Truncated Newton Use the gradient and Hessian
    to define a linear system, and solve it
    approximately by a set of CG steps.
  • Interior-Point Algorithms Separate to positive
    and negative entries, and use both the primal and
    the dual problems barrier for forcing
    positivity.

13
General-Purpose Software?
  • So, simply download one of many general-purpose
    packages
  • L1-Magic (interior-point solver),
  • Sparselab (interior-point solver),
  • MOSEK (various tools),
  • Matlab Optimization Toolbox (various tools),
  • A Problem General purpose software-packages
    (algorithms) are typically performing poorly on
    our task. Possible reasons
  • The fact that the solution is expected to be
    sparse (or nearly so) in our problem is not
    exploited in such algorithms.
  • The Hessian of f(a) tends to be highly

    ill-conditioned near the (sparse) solution.
  • So, are we stuck? Is this problem really that
    complicated?

14
Consider the Unitary Case (DDHI)
We got a separable set of m identical 1D
optimization problems
15
The 1D Task
We need to solve the following 1D problem
Such a Look-Up-Table (LUT) aoptS?,?(ß) can be
built for ANY sparsity measure function ?(a),
including non-convex ones and non-smooth ones
(e.g., L0 norm), giving in all cases the GLOBAL
minimizer of g(a).
16
The Unitary Case A Summary
Minimizing is done by
DONE!
Multiply by DH
The obtained solution is the GLOBAL minimizer of
f(a), even if f(a) is
non-convex.
17
Brief Summary 2
The minimization of Leads to two very
Contradicting Observations
  1. The problem is quite hard classic optimization
    find it hard.
  2. The problem is trivial for the case of unitary D.

How Can We Enjoy This Simplicity in the General
Case?
18
Agenda
  • Motivating the Minimization of f(a)
  • Describing various applications that need this
    minimization
  • Some Motivating Facts
  • General purpose optimization tools, and the
    unitary case
  • Iterated-Shrinkage Algorithms
  • We describe five versions of those in detail
  • Some Results
  • Image deblurring results
  • Conclusions

19
Iterated-Shrinkage Algorithms?
  • We will present THE PRINCIPLES of several
    leading methods
  • Bound-Optimization and EM Figueiredo Nowak,
    03,
  • Surrogate-Separable-Function (SSF) Daubechies,
    Defrise, De-Mol, 04,
  • Parallel-Coordinate-Descent (PCD) algorithm Elad
    05, Matalon, et.al. 06,
  • IRLS-based algorithm Adeyemi Davies, 06, and
  • Stepwise-Ortho-Matching Pursuit (StOMP) Donoho
    et.al. 07.
  • Common to all is a set of operations in every
    iteration that includes (i) Multiplication by
    D,
    (ii) Multiplication by DH, and
    (iii) A
    Scalar shrinkage on the solution S?,?(a).
  • Some of these algorithms pose a direct
    generalization of the unitary case, their 1st
    iteration is the solver we have seen.

20
1. The Proximal-Point Method
  • Aim minimize f(a) Suppose it is found to be
    too hard.
  • Define a surrogate-function g(a,a0)f(a)dist(a-a0
    ), using a general (uni-modal, non-negative)
    distance function.
  • Then, the following algorithm necessarily
    converges to a local minima of f(a) Rokafellar,
    76
  • Comments (i) Is the minimization of g(a,a0)
    easier? It better be!
  • (ii) Looks like it will slow-down convergence.
    Really?

a2
a1
a0
Minimize
Minimize
g(a,a1)
g(a,a0)
21
The Proposed Surrogate-Functions
  • Our original function is
  • The distance to use
  • Proposed by Daubechies, Defrise, De-Mol 04.
    Require .
  • The beauty in this choice the term
    vanishes

It is a separable sum of m 1D problems. Thus, we
have a closed form solution by THE SAME
SHRINKAGE !!
  • Minimization of g(a,a0) is done in a closed form
    by shrinkage, done on the vector ßk, and this
    generates the solution ak1 of the next
    iteration.

22
The Resulting SSF Algorithm
While the Unitary
case solution is given by
Multiply by DH
Multiply by DH/c
Multiply by D
LUT
23
2. Bound-Optimization Technique
  • Aim minimize f(a) Suppose it is found to be
    too hard.
  • Define a function Q(a,a0) that satisfies the
    following conditions
  • Q(a0,a0)f(a0),
  • Q(a,a0)f(a) for all a, and
  • ?Q(a,a0) ?f(a) at a0.
  • Then, the following algorithm necessarily
    converges to a local minima of f(a) Hunter
    Lange, (Review)04
  • Well, regarding this method
  • The above is closely related to the EM algorithm
    Neal Hinton, 98.
  • Figueiredo Nowaks method (03) use the BO
    idea to minimize f(a). They use the VERY SAME
    Surrogate functions we saw before.

24
3. Start With Coordinate Descent
  • We aim to minimize
    .
  • First, consider the Coordinate Descent (CD)
    algorithm.
  • This is a 1D minimization problem
  • It has a closed for solution,
    using a
    simple SHRINKAGE
    as before, applied
    on the
    scalar ltej,djgt.

25
Parallel Coordinate Descent (PCD)
Current solution for minimization of f(a)
  • We will take the sum of these
    m descent directions for the
    update step.
  • Line search is mandatory.
  • This leads to

m-dimensional space
26
The PCD Algorithm Elad, 05 Matalon, Elad,
Zibulevsky, 06
Where and ? represents a
line search (LS).
Note Q can be computed quite easily off-line.
Its storage is just like storing the vector ak.
27
Algorithms Speed-Up
Surprising as it may sound, these very effective
acceleration methods can be implemented with no
additional cost (i.e.,
multiplications by D or DT)
Zibulevsky Narkis, 05 Elad, Matalon,
Zibulevsky, 07
28
Brief Summary 3
For an effective minimization of the function
we saw several iterated-shrinkage algorithms,
built using
  1. Proximal Point Method
  2. Bound Optimization
  3. Parallel Coordinate Descent
  1. Iterative Reweighed LS
  2. Fixed Point Iteration
  3. Greedy Algorithms

How Are They Performing?

29
Agenda
  • Motivating the Minimization of f(a)
  • Describing various applications that need this
    minimization
  • Some Motivating Facts
  • General purpose optimization tools, and the
    unitary case
  • Iterated-Shrinkage Algorithms
  • We describe five versions of those in detail
  • Some Results
  • Image deblurring results
  • Conclusions

30
A Deblurring Experiment
White Gaussian Noise s22
31
Penalty Function More Details
Note This experiment is similar (but not
eqiuvalent) to one of tests done in Figueiredo
Nowak 05, that leads to
state-of-the-art results.
32
The Dictionary Undecimated Haar
This process is actually creating the
multiplication by the matrix DT
33
The Dictionary Example Atoms
34
The Function
35
The Function - A Closer Look
36
The Function - The Shrinkage
Analytic Expression for the Shrinkage
37
So, The Results The Function Value
f(a)-fmin
9
10
SSF
8
SSF-LS
10
SSF-SESOP-5
7
10
6
10
5
10
4
10
3
10
2
10
0
5
10
15
20
25
30
35
40
45
50
Iterations/Computations
38
So, The Results The Function Value
f(a)-fmin
9
10
Comment Both SSF and PCD (and their
accelerated versions) are provably converging to
the minima of f(a).
SSF
8
SSF-LS
10
SSF-SESOP-5
PCD-LS
7
10
PCD-SESOP-5
6
10
5
10
4
10
3
10
2
10
0
5
10
15
20
25
30
35
40
45
50
Iterations/Computations
39
So, The Results The Function Value
f(a)-fmin
9
10
8
10
7
10
6
10
5
10
4
10
3
10
2
10
0
50
100
150
200
250
Iterations/Computations
40
So, The Results ISNR
ISNR dB
10
5
0
SSF
6.41dB
SSF-LS
SSF-SESOP-5
-5
-10
-15
-20

0
5
10
15
20
25
30
35
40
45
50
Iterations/Computations
41
So, The Results ISNR
ISNR dB
10
5
0
SSF
SSF-LS
7.03dB
SSF-SESOP-5
-5
PCD-LS
PCD-SESOP-5
-10
-15
-20

0
5
10
15
20
25
30
35
40
45
50
Iterations/Computations
42
So, The Results ISNR
ISNR dB
10
Comments StOMP is inferior in speed and final
quality (ISNR5.91dB) due to to over-estimated
support. PDCO is very slow due to the numerous
inner Least-Squares iterations done by CG. It is
not competitive with the Iterated-Shrinkage
methods.
5
0
-5
-10
-15
-20
0
50
100
150
200
250
Iterations/Computations
43
Visual Results
PCD-SESOP-5 Results
original (left), Measured (middle), and Restored
(right) Iteration 0 ISNR-16.7728 dB
original (left), Measured (middle), and Restored
(right) Iteration 1 ISNR0.069583 dB
original (left), Measured (middle), and Restored
(right) Iteration 2 ISNR2.46924 dB
original (left), Measured (middle), and Restored
(right) Iteration 3 ISNR4.1824 dB
original (left), Measured (middle), and Restored
(right) Iteration 4 ISNR4.9726 dB
original (left), Measured (middle), and Restored
(right) Iteration 5 ISNR5.5875 dB
original (left), Measured (middle), and Restored
(right) Iteration 6 ISNR6.2188 dB
original (left), Measured (middle), and Restored
(right) Iteration 7 ISNR6.6479 dB
original (left), Measured (middle), and Restored
(right) Iteration 8 ISNR6.6789 dB
original (left), Measured (middle), and Restored
(right) Iteration 12 ISNR6.9416 dB
original (left), Measured (middle), and Restored
(right) Iteration 19 ISNR7.0322 dB
44
Agenda
  • Motivating the Minimization of f(a)
  • Describing various applications that need this
    minimization
  • Some Motivating Facts
  • General purpose optimization tools, and the
    unitary case
  • Iterated-Shrinkage Algorithms
  • We describe five versions of those in detail
  • Some Results
  • Image deblurring results
  • Conclusions

45
Conclusions The Bottom Line
  • If your work leads you to the
  • need to minimize the problem
  • Then
  • We recommend you use an Iterated-Shrinkage
    algorithm.
  • SSF and PCD are Preferred both are provably
    converging to the (local) minima of f(a), and
    their performance is very good, getting a
    reasonable result in few iterations.
  • Use SESOP Acceleration it is very effective,
    and with hardly any cost.
  • There is Room for more work on various aspects of
    these algorithms see the accompanying paper.
Write a Comment
User Comments (0)
About PowerShow.com