Title: Djemel Ziou
1Learning of Data Collections in High-dimensional
Spaces Without Supervision
- Djemel Ziou
- NSERC/Bell Canada Chair in personal imaging
Computer Science dept. - Université de Sherbrooke
- Quebec, Canada
1
2Content
- Visual collection management
- Machine learning
- Image segmentation
- Content based image suggestion
3Visual collection management
4Motivations
NSF 2007, B. Efron 2002.
5Reactive Access to Collections
- Text-based image retrieval
- Text keywords extracted from Web pages
containing the image, figure captions,
Short term need User queries an information
retrieval system
- Content-based image retrieval
- Visual appearance color, shape, texture, regions
of interest,
- Limitations
- Query, features, similarity, indexing,
5
6Proactive Access To Collections
Predict the buyers needs
Suggestion
Suggestion rules
- Collaboration Users conformity to groups ?
Opinions of other users - 2. Content Conformity to himself
- Items with same tags (keywords)
6
7Machine Learning
7
8Introduction
- Representation of stimulus
9Introduction
Under certain assumptions (structural, MAP)
Unlike generative learning, 1) provides no
information about x (
) 2) Discriminative learning cannot be used
with unlabelled data (C must be observed).
10Discriminative Learning Bayesian Logistic
Regression Ksantini, Ziou, Colin, Dubeau. IEEE
Trans. On PAMI, 2008
Maximizing the conditional Log-Likelihood.
where
There are several drawbacks (high-dimension,
separability, )
Bayesian formulation
11Variational approximation and Jensens inequality
lead to
12Generative learning case of finite mixture of
pdfs
Problems pdf, estimation, model selection,
Which Pdf?
Gaussian, Gamma, , same or different pdfs for
populations.
Mixture of different Pdfs for SAR images El Zaart
and Ziou, Int J. Remote Sensing 2007
13The Generalized Dirichlet Distribution
Generalized Dirichlet distribution (GDD)
14Multi-dimensionality is Omnipresent
- Multidimensional data
- Image Descriptors 128 000 features (128 Sift
features x 1000 interest points). - Faces 128x128 pixels 16384 features/face
- Text terms in a corpus 10 000
14
15High-Dimensional data
Bouguila and Ziou. IEEE Trans. On PAMI, 2007
Boutemedjet, Bouguila and Ziou. IEEE Trans. On
PAMI, 2009
If is GDD (
)
If d1
for d2D
Each is a Beta
16Feature Selection
- Mixture model before and after transformation
16
17Feature Selection Model
Boutemedjet, Bouguila and Ziou. IEEE Trans. On
PAMI, 2009
- Relevance Criterion marginal independence of Xl
from the class label Z - Label Xl with hidden Bernoulli variable ?l, such
that ?l0 when Xl ?l, - General definition ?l mixture of K ?kl
- e.g. distribution of background in object
images. - Label Xl in the mixture ?l by hidden multinomial
variable - Approximation
- New mixture model Generalized Dirichlet (GD) with
selection of independent features
17
18Unsupervised Learning using the MML Principle
Bouguila and Ziou. IEEE Trans. On TKDE, 2007
Paradigm
Send
Encode
Decode
What is the minimum message length?
- is the number of parameters being
estimated and equal to M (2D1). - is the prior probability.
- is the Fisher information
(determinant of the Hessian matrix). - Problems ? And ?
19Unsupervised Learning MML
Boutemedjet, Bouguila and Ziou. IEEE Trans. On
PAMI, 2009
- Fisher Information
- E.g.
- Prior distribution
- E.g.
- Message Length of the data set
19
20Optimization of MML
- Expectation Maximization (EM) algorithm
- E-step expected posterior probabilities
- M-step
2x2 matrix
21Object image categorization
- Goal Identify categories and irrelevant
features - Challenge Intra-class variability inter-class
similarity - Existing Supervised, K-NN with Euclidian
distance - Collection 2688 images, 8 classes
- Features
- Scale Invariant Feature Transform (SIFT)
1.5.106 - descriptors 128-D (2 GB)
- Visual vocabulary 700 visual words
- Probabilistic Latent Semantic Indexing (pLSI)
- P(zI) hidden aspects defined on simplex ?
- Non-Euclidian
Challenging problem in computer vision
22Results
Feature Selection improves the accuracy of image
categorization
22
23Image segmentation and object tracking
M.S. Allili and Ziou, Int. J. of Computer
Mathematics, 2007.M.S. Allili and Ziou, J.
Neurocomputing, 2008.
24Problem formulation of segmentation
Active contour based approach
Variational formulation
Final contour
Initial Contour
25Proposed approach
Statistical Model selection
Contrast estimation
Energy functional
Euler-Lagrange PDE
26Topology change (Level sets)
Experimental results
27Object tracking in video
28CBIS as a Model Selection Problem
Boutemdjet and Ziou, IEEE Trans. on multimedia,
2008.
29Suggestion Criteria
- Data
- Users Uu1,u2,,UNu
- Contexts Ee1,e2,,eNe
- Images X x1,x2,,xNx
- Ratings of user on images
- D(u(i),e(i),x(i),r(i)),i1,,N,
- Data modeling principle
- Similar users prefer visually and semantically
similar products - Suggestion consumers need highly rated and
less redundant products
29
30Data model p(u,e,x,r)
- Rating model data ? Each Quadruplet (u,e,v,x)
is a random vector - Discover user/image classes (z,c) and Label
(u,e,v,x) with 2 hidden variables z user class,
c image class - All variables except x are discrete multinomial
distributions, xGD - Parameters
- Diversity Penalize predicted ratings for
consumed images Xue - Consumed images become irrelevant
Nue(u,e,xtue,r-),t1,..,Nue - Update T from Nue
- New data are handled.
30
31Algorithm
31
32Results Mean Absolute Error (MAE)
- PCC Pearson Correlation Coefficients (P. Resnick
et al., CSCW 1994) - Aspect Model (T. Hofmann, ACM TOIS 2004)
- Flexible Mixture Model (L. Si R. Jin, ICML
2003) - User Rating Profile (B. Marlin, NIPS 2004)
- V-FMM No contextual information, ESingleton
- V-GD-FMM No Feature Selection
PCC Aspect FMM URP V-FMM V-GD-FMM I-VCC D-VCC
Avg. MAE 1.327 1.201 1.145 1.116 0.890 0.754 0.712 0.645
Std. Deviation 0.040 0.051 0.036 0.042 0.038 0.027 0.022 0.014
Improvement () 0.00 9.49 13.71 15.90 32.94 43.18 51.62 55.84
Feature Selection improves the rating prediction
accuracy
32
3333
34References
- M. S. Allili, D. Ziou. Object tracking in videos
using adaptive mixture models and active
contours. Neurocomputing 7, pp. 2001-2011, 2008. - M. S. Allili, D. Ziou Automatic colour-texture
image segmentation using active contours. Int. J.
Comput. Math. 84(9) 1325-1338, 2007. - S. Boutemedjet, Djemel Ziou. A Graphical Model
for Context-Aware Visual Content Recommendation.
IEEE Trans. on Multimedia 10, pp. 52-62, 2008. - S. Boutemedjet, N. Bouguila, and D. Ziou (In
press). A Hybrid Feature Extraction Selection
Approach for High-Dimensional Non-Gaussian Data
Clustering. IEEE Trans. on Pattern Analysis and
Machine Intelligence, 2009. - N. Bouguila and D. Ziou High-Dimensional
Unsupervised Selection and Estimation of a Finite
Generalized Dirichlet Mixture Model Based on
Minimum Message Length. IEEE Trans. on Pattern
Analysis and Machine Intelligence, 2007. - R. Ksantini, D. Ziou, B. Colin, F. Dubeau.
Weighted Pseudometric Discriminatory Power
Improvement Using a Bayesian Logistic Regression
Model Based on a Variational Method. IEEE Trans.
Pattern Anal. Mach. Intell. 30(2) 253-266, 2008. - D. Ziou, T. Hamri, S. Boutemedjet. A hybrid
probabilistic framework for content-based image
retrieval with feature weighting. Pattern
Recognition 42(7) 1511-1519, 2009. - M. L. Kherfi, D. Ziou. Relevance feedback for
CBIR a new approach based on probabilistic
feature weighting with positive and negative
examples. IEEE Trans. on Image Processing 15(4)
1017-1030 2006. - M.-F. Auclair-Fortier, D. Ziou. A Global Approach
for Solving Evolutive Heat Transfer for Image
Denoising and Inpainting. IEEE Trans. Image
Processing, 152558-2574, 2006. - A. F. El Ouafdi, D. Ziou, and H. Krim. A smart
stochastic approach for manifolds smoothing.
Comput. Graphic Forum 27, pp. 1357-1364, 2008.
34