Title: ERROR ENTROPY, CORRENTROPY AND M-ESTIMATION
1ERROR ENTROPY, CORRENTROPY AND M-ESTIMATION
- Weifeng Liu, P. P. Pokharel, J. C. Principe
- CNEL, University of Florida
- weifeng_at_cnel.ufl.edu
- Acknowledgment This work was partially supported
by NSF grant ECS-0300340 and ECS-0601271.
2Outlines
- Maximization of correntropy criterion (MCC)
- Minimization of error entropy (MEE)
- Relation between MEE and MCC
- Minimization of error entropy with fiducial
points - Experiments
3Supervised learning
- Desired signal D
- System output Y
- Error signal E
4Supervised learning
- The goal in supervised training is to bring the
system output close to the desired signal. - The concept of close, implicitly or explicitly
employs a distance function or similarity
measure. - Equivalently, to minimize the error in some
sense. - For instance, MSE
5Maximization of Correntropy Criterion
- Correntropy of the desired signal and the system
output V(D,Y) is estimated by - where
6Correntropy induced metric
- Define
- satisfy the following properties
- Non-negativity
- Identity of indiscernibles
- Symmetry
- Triangle inequality
7CIM contours
- Contours of CIM(E,0) in 2D sample space
- close, like L2 norm
- Intermediate, like L1 norm
- far apart, saturates with large-value elements
- (direction sensitive)
8MCC is minimization of CIM
?
?
9MCC is M-estimation
MCC ?
?
where
10Minimization of Error Entropy
- Renyis quadratic error entropy is estimated by
- Information Potential (IP)
11Relation between MEE and MCC
12Relation between MEE and MCC
13IP induced metric
- Define
- is a pseudo-metric.
- NO identity of indiscernibles.
14IPM contours
- Contours of IPM(E,0) in 2D sample space
- valley along e1 e2, not sensitive to the error
mean - saturates with points far from the valley
-
15MEE and its equivalences
?
?
?
?
16MEE is M-estimation
Assume the error PDF with then
17Nuisance of conventional MEE
- How to determine the location of the error PDF
since it is shift-invariant. - Conventionally by making the error mean equal to
zero. - In the case that the error PDF is non-symmetric
or has heavy tails the estimation of error mean
is problematic. - Fixing the error peak at the origin is obviously
better than the conventional method of shifting
the error based on zero-mean.
18ERROR ENTROPY WITH FIDUCIAL POINTS
- supervised training ? most of the errors equal to
zero - minimizes the error entropy with respect to 0
- Denote
- E is the error vector and e0 serves a point of
reference
19ERROR ENTROPY WITH FIDUCIAL POINTS
20ERROR ENTROPY WITH FIDUCIAL POINTS
- ? is a weighting constant between 0 and 1
- how many fiducial points at the origin
- ? 0 ? MEE
- ? 1 ? MCC
- 0 lt ? lt 1 ? Minimization of Error Entropy with
Fiducial points (MEEF).
21ERROR ENTROPY WITH FIDUCIAL POINTS
- MCC term locates the main peak of the error PDF
and fixes it at the origin even in the cases
where the estimation of the error mean is not
robust - Unifying two cost functions actually retains all
the merits of being completely robust with
outlier resistance and kernel size resilience.
22Metric induced by MEEF
- Well-defined metric
- directional sensitive
- favor errors with the same sign
- penalize errors have different signs
23Experiment 1 Robust regression
- X input variable
- f unknown function
- N noise
- Y observation
- Noise PDF
24Regression results
25Experiment 2 Chaotic signal prediction
- Mackey-Glass chaotic time series with parameter
t30 - time delayed neural network (TDNN)
- 7 inputs,
- 14 hidden PEs
- tanh nonlinearity
- 1 linear output
26Training error PDF
27Conclusions
- Establish connections between MEE, distance
function and M-estimation - Theoretically explains the robustness of this
family of cost functions - Unify MEE and MCC in the framework of information
theoretic models - propose a new cost functionminimization of error
entropy with fiducial points (MEEF) which solves
the problem of MEE being shift-invariant in an
elegant and robust way.