Michael Elad

About This Presentation

Title:

Michael Elad

Description:

Sparse & Redundant Representations by Iterated-Shrinkage Algorithms * Michael Elad The Computer Science Department The Technion Israel Institute of technology – PowerPoint PPT presentation

Number of Views:120

Avg rating:3.0/5.0

Slides: 46

Provided by: Michael3416

Category:

more less

Transcript and Presenter's Notes

Title: Michael Elad

1
Sparse Redundant Representations by
Iterated-Shrinkage Algorithms

Michael Elad
The Computer Science Department
The Technion Israel Institute of
technology
Haifa 32000, Israel

2
Todays Talk is About
the Minimization of the Function by
Iterated-Shrinkage Algorithms

Why This Topic?
Key for turning theory to applications in
Sparse-land,
Shrinkage is intimately coupled with wavelet,
The applications we target fundamental signal
processing ones.
This is a hot topic in harmonic analysis.

Today we will discuss
Why this minimization task is important?
Which applications could benefit from this
minimization?
How can it be minimized effectively?
What iterated-shrinkage methods are out there?
and

3
Agenda

Motivating the Minimization of f(a)
Describing various applications that need this
minimization
Some Motivating Facts
General purpose optimization tools, and the
unitary case
Iterated-Shrinkage Algorithms
We describe five versions of those in detail
Some Results
Image deblurring results
Conclusions

Why?
4
Lets Start with Image Denoising
?
Remove Additive Noise
Many of the existing image denoising algorithms
are related to the minimization of an energy
function of the form
y Given measurements x Unknown to be
recovered
We will use a Sparse Redundant Representation
prior.
5
Our MAP Energy Function

We assume that x is created by M
where ? is a
sparse redundant
representation and D is a known
dictionary.
This leads to
This MAP denoising algorithm is known as basis
Pursuit Denoising Chen, Donoho, Saunders
1995.
The term ?(a) measures the sparsity of the
solution ?
L0-norm (a0) leads to non-smooth non-convex
problem.
The Lp norm ( ) with 0ltp1 is often found to
be equivalent.
Many other ADDITIVE sparsity measures are
possible.

6
General (linear) Inverse Problems

Assume that x is known to emerge from M, as
before.

Suppose we observe , a blurred
and noisy version of x. How could we recover x?

A MAP estimator leads to

7
Inverse Problems of Interest

De-Noising
De-Blurring
In-Painting
De-Mosaicing
Tomography
Image Scale-Up
super-resolution
And more

8
Signal Separation

Given a mixture zx1x2v two sources, M1 and
M2 , and white Gaussian noise v, we desire to
separate it to its ingredients.
Written differently
Thus, solving this problem using MAP leads to the
Morphological Component Analysis (MCA) Starck,
Elad, Donoho, 2005

M1?
v
z
9
Compressed-Sensing Candes et.al. 2006,
Donoho, 2006

In compressed-sensing we compress the signal x by
exploiting its origin. This is done
by pltltn random projections.
The core idea (P size
pn) holds all the information about the original
signal x, even though pltltn.

Reconstruction? Use MAP again and solve

10
Brief Summary 1
The minimization of the function is a worthy
task,
serving many various applications.
So, How This Should be Done?
11
Agenda

Motivating the Minimization of f(a)
Describing various applications that need this
minimization
Some Motivating facts
General purpose optimization tools, and the
unitary case
Iterated-Shrinkage Algorithms
We describe five versions of those in detail
Some Results
Image deblurring results
Conclusions

12
Is there a Problem?

The first thought With all the existing
knowledge in optimization, we could find a
solution.

Methods to consider
(Normalized) Steepest Descent compute the
gradient and follow its path.
Conjugate Gradient use the gradient and the
previous update direction, combined by a preset
formula.
Pre-Conditioned SD weight the gradient by the
Hessians diagonal inverse.
Truncated Newton Use the gradient and Hessian
to define a linear system, and solve it
approximately by a set of CG steps.
Interior-Point Algorithms Separate to positive
and negative entries, and use both the primal and
the dual problems barrier for forcing
positivity.

13
General-Purpose Software?

So, simply download one of many general-purpose
packages
L1-Magic (interior-point solver),
Sparselab (interior-point solver),
MOSEK (various tools),
Matlab Optimization Toolbox (various tools),
A Problem General purpose software-packages
(algorithms) are typically performing poorly on
our task. Possible reasons
The fact that the solution is expected to be
sparse (or nearly so) in our problem is not
exploited in such algorithms.
The Hessian of f(a) tends to be highly

ill-conditioned near the (sparse) solution.
So, are we stuck? Is this problem really that
complicated?

14
Consider the Unitary Case (DDHI)
We got a separable set of m identical 1D
optimization problems
15
The 1D Task
We need to solve the following 1D problem
Such a Look-Up-Table (LUT) aoptS?,?(ß) can be
built for ANY sparsity measure function ?(a),
including non-convex ones and non-smooth ones
(e.g., L0 norm), giving in all cases the GLOBAL
minimizer of g(a).
16
The Unitary Case A Summary
Minimizing is done by
DONE!
Multiply by DH
The obtained solution is the GLOBAL minimizer of
f(a), even if f(a) is
non-convex.
17
Brief Summary 2
The minimization of Leads to two very
Contradicting Observations

The problem is quite hard classic optimization
find it hard.
The problem is trivial for the case of unitary D.

How Can We Enjoy This Simplicity in the General
Case?
18
Agenda

Motivating the Minimization of f(a)
Describing various applications that need this
minimization
Some Motivating Facts
General purpose optimization tools, and the
unitary case
Iterated-Shrinkage Algorithms
We describe five versions of those in detail
Some Results
Image deblurring results
Conclusions

19
Iterated-Shrinkage Algorithms?

We will present THE PRINCIPLES of several
leading methods
Bound-Optimization and EM Figueiredo Nowak,
03,
Surrogate-Separable-Function (SSF) Daubechies,
Defrise, De-Mol, 04,
Parallel-Coordinate-Descent (PCD) algorithm Elad
05, Matalon, et.al. 06,
IRLS-based algorithm Adeyemi Davies, 06, and
Stepwise-Ortho-Matching Pursuit (StOMP) Donoho
et.al. 07.
Common to all is a set of operations in every
iteration that includes (i) Multiplication by
D,
(ii) Multiplication by DH, and
(iii) A
Scalar shrinkage on the solution S?,?(a).
Some of these algorithms pose a direct
generalization of the unitary case, their 1st
iteration is the solver we have seen.

20
1. The Proximal-Point Method

Aim minimize f(a) Suppose it is found to be
too hard.
Define a surrogate-function g(a,a0)f(a)dist(a-a0
), using a general (uni-modal, non-negative)
distance function.
Then, the following algorithm necessarily
converges to a local minima of f(a) Rokafellar,
76
Comments (i) Is the minimization of g(a,a0)
easier? It better be!
(ii) Looks like it will slow-down convergence.
Really?

a2
a1
a0
Minimize
Minimize
g(a,a1)
g(a,a0)
21
The Proposed Surrogate-Functions

Our original function is

The distance to use
Proposed by Daubechies, Defrise, De-Mol 04.
Require .

The beauty in this choice the term
vanishes

It is a separable sum of m 1D problems. Thus, we
have a closed form solution by THE SAME
SHRINKAGE !!

Minimization of g(a,a0) is done in a closed form
by shrinkage, done on the vector ßk, and this
generates the solution ak1 of the next
iteration.

22
The Resulting SSF Algorithm
While the Unitary
case solution is given by
Multiply by DH
Multiply by DH/c
Multiply by D
LUT
23
2. Bound-Optimization Technique

Aim minimize f(a) Suppose it is found to be
too hard.
Define a function Q(a,a0) that satisfies the
following conditions
Q(a0,a0)f(a0),
Q(a,a0)f(a) for all a, and
?Q(a,a0) ?f(a) at a0.
Then, the following algorithm necessarily
converges to a local minima of f(a) Hunter
Lange, (Review)04
Well, regarding this method
The above is closely related to the EM algorithm
Neal Hinton, 98.
Figueiredo Nowaks method (03) use the BO
idea to minimize f(a). They use the VERY SAME
Surrogate functions we saw before.

24
3. Start With Coordinate Descent

We aim to minimize
.
First, consider the Coordinate Descent (CD)
algorithm.
This is a 1D minimization problem
It has a closed for solution,
using a
simple SHRINKAGE
as before, applied
on the
scalar ltej,djgt.

25
Parallel Coordinate Descent (PCD)
Current solution for minimization of f(a)

We will take the sum of these
m descent directions for the
update step.
Line search is mandatory.
This leads to

m-dimensional space
26
The PCD Algorithm Elad, 05 Matalon, Elad,
Zibulevsky, 06
Where and ? represents a
line search (LS).
Note Q can be computed quite easily off-line.
Its storage is just like storing the vector ak.
27
Algorithms Speed-Up
Surprising as it may sound, these very effective
acceleration methods can be implemented with no
additional cost (i.e.,
multiplications by D or DT)
Zibulevsky Narkis, 05 Elad, Matalon,
Zibulevsky, 07
28
Brief Summary 3
For an effective minimization of the function
we saw several iterated-shrinkage algorithms,
built using

Proximal Point Method
Bound Optimization
Parallel Coordinate Descent

Iterative Reweighed LS
Fixed Point Iteration
Greedy Algorithms

How Are They Performing?

29
Agenda

Motivating the Minimization of f(a)
Describing various applications that need this
minimization
Some Motivating Facts
General purpose optimization tools, and the
unitary case
Iterated-Shrinkage Algorithms
We describe five versions of those in detail
Some Results
Image deblurring results
Conclusions

30
A Deblurring Experiment
White Gaussian Noise s22
31
Penalty Function More Details
Note This experiment is similar (but not
eqiuvalent) to one of tests done in Figueiredo
Nowak 05, that leads to
state-of-the-art results.
32
The Dictionary Undecimated Haar
This process is actually creating the
multiplication by the matrix DT
33
The Dictionary Example Atoms
34
The Function
35
The Function - A Closer Look
36
The Function - The Shrinkage
Analytic Expression for the Shrinkage
37
So, The Results The Function Value
f(a)-fmin
9
10
SSF
8
SSF-LS
10
SSF-SESOP-5
7
10
6
10
5
10
4
10
3
10
2
10
0
5
10
15
20
25
30
35
40
45
50
Iterations/Computations
38
So, The Results The Function Value
f(a)-fmin
9
10
Comment Both SSF and PCD (and their
accelerated versions) are provably converging to
the minima of f(a).
SSF
8
SSF-LS
10
SSF-SESOP-5
PCD-LS
7
10
PCD-SESOP-5
6
10
5
10
4
10
3
10
2
10
0
5
10
15
20
25
30
35
40
45
50
Iterations/Computations
39
So, The Results The Function Value
f(a)-fmin
9
10
8
10
7
10
6
10
5
10
4
10
3
10
2
10
0
50
100
150
200
250
Iterations/Computations
40
So, The Results ISNR
ISNR dB
10
5
0
SSF
6.41dB
SSF-LS
SSF-SESOP-5
-5
-10
-15
-20

0
5
10
15
20
25
30
35
40
45
50
Iterations/Computations
41
So, The Results ISNR
ISNR dB
10
5
0
SSF
SSF-LS
7.03dB
SSF-SESOP-5
-5
PCD-LS
PCD-SESOP-5
-10
-15
-20

0
5
10
15
20
25
30
35
40
45
50
Iterations/Computations
42
So, The Results ISNR
ISNR dB
10
Comments StOMP is inferior in speed and final
quality (ISNR5.91dB) due to to over-estimated
support. PDCO is very slow due to the numerous
inner Least-Squares iterations done by CG. It is
not competitive with the Iterated-Shrinkage
methods.
5
0
-5
-10
-15
-20
0
50
100
150
200
250
Iterations/Computations
43
Visual Results
PCD-SESOP-5 Results
original (left), Measured (middle), and Restored
(right) Iteration 0 ISNR-16.7728 dB
original (left), Measured (middle), and Restored
(right) Iteration 1 ISNR0.069583 dB
original (left), Measured (middle), and Restored
(right) Iteration 2 ISNR2.46924 dB
original (left), Measured (middle), and Restored
(right) Iteration 3 ISNR4.1824 dB
original (left), Measured (middle), and Restored
(right) Iteration 4 ISNR4.9726 dB
original (left), Measured (middle), and Restored
(right) Iteration 5 ISNR5.5875 dB
original (left), Measured (middle), and Restored
(right) Iteration 6 ISNR6.2188 dB
original (left), Measured (middle), and Restored
(right) Iteration 7 ISNR6.6479 dB
original (left), Measured (middle), and Restored
(right) Iteration 8 ISNR6.6789 dB
original (left), Measured (middle), and Restored
(right) Iteration 12 ISNR6.9416 dB
original (left), Measured (middle), and Restored
(right) Iteration 19 ISNR7.0322 dB
44
Agenda

Motivating the Minimization of f(a)
Describing various applications that need this
minimization
Some Motivating Facts
General purpose optimization tools, and the
unitary case
Iterated-Shrinkage Algorithms
We describe five versions of those in detail
Some Results
Image deblurring results
Conclusions

45
Conclusions The Bottom Line

If your work leads you to the
need to minimize the problem
Then
We recommend you use an Iterated-Shrinkage
algorithm.
SSF and PCD are Preferred both are provably
converging to the (local) minima of f(a), and
their performance is very good, getting a
reasonable result in few iterations.
Use SESOP Acceleration it is very effective,
and with hardly any cost.
There is Room for more work on various aspects of
these algorithms see the accompanying paper.

Write a Comment

User Comments (0)

About PowerShow.com

Michael Elad - PowerPoint PPT Presentation

Michael Elad

Sparse & Redundant Representations by Iterated-Shrinkage Algorithms * Michael Elad The Computer Science Department The Technion Israel Institute of technology – PowerPoint PPT presentation