Title: Novelty Detection Based on Information Matrix
1 Novelty Detection Based onInformation Matrix
Alexander
N. Dolia (ad_at_ecs.soton.ac.uk) 17 March, 2005
- School of Electronics and Computer Sciences
- University of Southampton, Southampton,UK
2Motivation
3Novelty Detection definitions
- What is a PATTERN ? What is an OUTLIER ?
- For novelty detection, the description of
normality is learnt by fitting a model to the set
of normal examples, and previously unseen
patterns are then tested by comparing their
novelty score (as defined by the model) - (Nairac, Corrbet-Clark, Townsend, Tarassenko,
1997) - An outlier would be an observation that deviates
so much from other observations as to arouse
suspicions that it was generated by a different
mechanism (Hawkins, 1980)
4Example 1
Normal
Novel
Novel
Novel
5Training set Researches in Machine Learning
- Potential problems
- Bad features
- Missing data, outliers in a training set
- Data non-stationarity
Is it a known researcher or novice/outlier?
Hint KPCA
Hint Convexnormalised cut
6Possible approaches
Density Estimation Estimate a density based on
training data Quantile Estimation Estimate a
quantile of the distribution underlying the
training data for a fixed constant
, attempt to find a small set such that
7Experimental design
- Where is the cost of an observation taken at
point x - Information matrix
8Equivalence Theorem
- the design maximize minimize
, - The design minimize
- (Kiefer, 1961)
- are equivalent. The information matrices of
all designs satisfying (1)-(3) coincide among
themselves. Any linear combination of designs
satisfying (1)-(3) also satisfying (1)-(3)
9Minimum Covering Ellipsoid
The problem of computing minimum covering
ellipsoid (MCE) for the set of points can be
regarded as the dual of a problem in optimal
design for parameter estimation in linear
regression, with the data set as the design space
(Titterington, 1975)
10Minimum Covering Ellipsoid in
- Ellipsoid centred in the origin
- where
-
-
- (Titterington, 1975)
- The theory optimal experimental design suggests
- at least k1 of the are non-zero
- at most k(k3)/2 are non-zero
- if exactly k1 are non-zero, they all equal
- the point has positive weight only if
it lies on the surface - of the ellipsoid, that is, only if
11Optimization
- is a design measure for all r
- the sequence
is monotonic increasing, strictly unless
is a fixed point of the recursion - the same is true of
- in the limit we obtain optimum design measure and
MCE
12Titteringtons algorithm Experiment (1)
13 Proposed approach
14Lagrangian
15Dual problem, experimental design
16What is outlier?
17Rousseeuw's MCD method
- The objective of Rousseeuw's MCD method is
similar to - the v-MCD method and is to find m
observations (out of N) whose covariance has the
lowest determinant (Rousseeuw, 84). - The MCD estimate of location is then the
average of these m points, whereas the MCD
estimate of scatter is their covariance matrix. - All possible sets can be found by using
exhaustive search or Monte-Carlo
18Experiment (2)
19Experiment (3)
20Kernel Principal Component Analysis (Scholkopf,
Smola, Muller, 1996)
- Given N data point in k dimensions let
- where each column represents one data point
- Choose an appropriate kernel and form the
Gram matrix - Form the modified Gram matrix
- Diagonilized to get eigenvalues and
eigenvectors - Use a feature selection method to choose subset
of - Project the data points on the eigenvectors
21Properties of non-robust MCE using KPCA
- The theory optimal experimental design and kernel
PCA suggest - if kltN at most k1 of the are non-zero
- at least of
are non-zero or -
and - if exactly k1 are non-zero, they all equal
- the point has positive weight only if
it lies on the surface - of the ellipsoid, that is, only if
22Experiment (4)
23Experiment (5)
24KPCARousseeuw's MCD method
25The minimum covering sphere problem and S-optimum
experimental design
The minimum covering sphere problem is the
S-optimum experimental design
26Simple algebra
27Illustrations of novelty detection methods
28Relations to Tax and Scholkopf methods
- Tax method,
- Scholkopf method and multiply by
(-1) or find min
29Potential Applications
- Tactical Aircraft
- Multi-sensor platform
- Multiple mission objectives
- Conflicting requirements
- Require optimum tracking and ID combined with
stealth.
- RADAR Management
- Passive and active RADAR
- Multi-site data fusion
- Improved tracking and ID
- Early warning systems
- Ground and air based
- ASW (Anti Submarine Warfare)
- Optimal sonobouy placement
- Passive and active sonar
- Adaptive array processing
- This is a scenario with multiple constraints
- Sonobouy cost
- Sensor lifetime
- Sensor localisation
- Deployment constraints
- Robotics UAVs
- Search and rescue robotics
- Reconnaissance
- Anti-Terrorist (e.g. Airport)
- Robot perception covers many areas, and sensor
management is likely to cover a broad spectrum
of application, both military and civilian. It
aims to optimise any suite of sensor resources to
improve system performance. Applicable to both
ground and air based platforms.
30Simulation and Testing
Robot Demonstrator Modern mobile robot platform
with dissimilar sensor suite to provide proof of
concept and valuable simulation data.
Additionally will provide data for project 8.5
Intelligent Sensor
Simulation MATLAB and C/C environments for
algorithm development and testing. Simulation
will form the foundation of the research and is
complemented by demonstrator
- Pioneer 3DX Mobile Robot
- SICK Laser Mapping System
- 16 Sensor Sonar Array
- PTZ Imaging with Active Infra-Red
- Additional Stationary Sensor Resources
- PTZ cameras with IR
- Directed microphone arrays
- Low cost fixed location sensors
- Linux / C / MATLAB driven
- Modular Decentralised Processing
- Tracking experiments
- REAL TIME CAPABILITY
www.activrobots.com
31Conclusions
- New algorithm for novelty detection based on
Information Matrix is proposed - We view the novelty detection or single-class
classification as the experimental design problem - Preliminary simulation experiments illustrate the
application to the novelty detection problem - We demonstrate that Scholkopfs and Taxs
algorithms could be a particular case of our
approach when the objective is the trace of the
information matrix
32Future work
- More sophisticated algorithms for large scale
optimization (e.g., based on a conditional
gradient algorithm and active set strategy) - Modified Titterington algorithm with upper bound
on Lagrangian multipliers - On-line novelty detection using Information
Matrix - Bounds on rate of convergence and generalizations
33Acknowledgement
- Many thanks to T. De Bie, J.S. Shawe-Taylor,
- S. Szedmak and D.M. Titterington for helpful
suggestions and discussions. - Many thanks to C.J. Harris, S.F. Page, N.M.
White - This research is partially supported by the
Data Information Fusion Defence Technology
Centre, United Kingdom, under DTC Projects 8.1
Active multi-sensor management'' and the PASCAL
network of excellence.
34