Machine Learning for Data Mining - PowerPoint PPT Presentation

1 / 64
About This Presentation
Title:

Machine Learning for Data Mining

Description:

Magnification factors and curvatures of the Bernoulli Trait manifold on handwritten digits data ... Hierarchy of Local Magnification Factors. 60 ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 65
Provided by: atak4
Category:

less

Transcript and Presenter's Notes

Title: Machine Learning for Data Mining


1
Machine Learning for Data Mining
  • Ata Kaban
  • The University of Birmingham

2
Overview
  • Branches of machine learning
  • Roadmap of unsupervised learning
  • Some example applications from my research work

3
Machine learning
  • Data y1, y2, (e.g. sensory inputs)
  • Supervised learning
  • The machine is given desired outputs z1, z2,
    attached to the input data
  • Goal learn to produce the correct output for a
    new input
  • Reinforcement learning
  • The machine can also produce actions, which
    affect the state of the environment (the data),
    and receives rewards / punishments.
  • Goal Learn to act in a way that maximises
    rewards in the long term.
  • Unsupervised learning
  • Goal build a model of the data, which can be
    used for reasoning, explanation, decision
    making, prediction, etc. or to make other tasks
    easier

4
Overview
  • Branches of machine learning
  • Roadmap of unsupervised learning
  • Some example applications from my research work

5
Unsupervised learning
  • Goals Finding useful representations
  • Finding clusters
  • Dimensionality reduction
  • Building topographic maps
  • Finding hidden causes that explain the data
  • Modelling the data density
  • Uses
  • Data compression
  • Outlier detection
  • Make other learning tasks easier
  • A theory of human learning perception

6
More examples of problems
a more appropriate model should consider some
conceptual dimensions instead of words.
(Gardenfors)
  • Finding topics, meanings, intentions
  • The two choice questionnaire problem
  • Word saliency

7
Latent variable models a useful formalism
  • Hidden (latent) variables
  • Topics
  • Intentions
  • Relative importance
  • Observed variables
  • Stream
  • of words
  • (with grammar syntax)
  • Assuming that there is a systematic relationship
    between these two categories of variables,
  • We try to find out the hidden variables of
    interest from the observed variables.

8
  • Modelling and inference
  • modelling
  • Specify how the hidden variables of interest
    might have generated the observed variables
  • Infer the inverse stochastic mapping from hidden
    variables to observed variables

inference
data
Hiddencauses
stochastic mapping
9
  • Probabilistic latent variable models

latent variable
observed data
noise
parameters
Linear models with Gaussian latent prior FA,
PPCA, PCA Finite mixture models Linear models
with non-Gaussian latent prior IFA, ICA,
PP Non-linear models local liner, explicit
non-linear GTM
10
what are those acronyms
  • These are some classical and quite useful
    techniques for data analysis
  • FA Factor Analysis
  • PCA Principal Component Analysis
  • PPCA Probabilistic Principal Component Analysis
  • MoG Mixture of Gaussians
  • IFA Independent Factor Analysis
  • ICA Independent Component Analysis
  • GTM Generative Topographic Mapping

11
Applications of latent variable models
  • Tools for discovering latent structure in the
    observable data
  • Data analysis visualisation
  • Application domains
  • Data mining,
  • Tele-communications
  • Bio-informatics
  • Fraud detection
  • Information retrieval
  • Marketing analysis

12
3 basic types of latent variable models the
intuition
13
Linear latent variable models
  • I. Models with Gaussian latent prior p(x)N(0,I)
  • Factor Analysis (FA) nN(0,S), S diagonal
  • Probabilistic Principal Components Analysis
    (PPCA) n(0,s2I) ? PCA n d(0)
  • II. Mixture models p(x)Skd(x-xk)p(xk)
  • Mixture of Gaussians (MoG) nN(0,S) or (0,s2I)
  • III. Models with non-Gaussian (e.g. sparse)
    latent priors p(x) non-Gaussian
  • Independent Factor Analysis (IFA) nN(0,S),
    p(x)SkNx(0,s) p(k)
  • Independent Component Analysis (ICA) n d(0),
    p(x) non-Gaussian, e.g. Laplace

- Compression - Visualisation
- Clustering
  • Signal separation
  • Structure discovery
  • Visualisation
  • Clustering

14
D UUT D
Compression with PCA
15
Text Data Compression can capture synonymy
16
Term x Documents Matrix
17
This is called LSA
Latent Semantic Analysis Querytheory
application
It is performed by SVD Singular Value
Decomposition, which is closely related to PCA
Project documents E-1 UT D and query words E-1
VT DT in the same space
Query (small) document
18
Clustering
  • of users web browsing behaviours from Internet
    Information Server logs for msnbc.com in one days
    time.
  • 17 page categories frontpage, news, tech, local,
    opinion, on-air, misc, weather, health, living,
    business, sports, summary, bbs, travel, msn-news,
    msn-sports
  • Example data
  • 1 1
  • 2
  • 3 2 2 4 2 2 2 3 3
  • 5
  • 6 7 7 7 6 6 8 8 8 8
  • etc

19
(No Transcript)
20
Non-Gaussian Latent Variable Models ICA
solving inverse problems
  • Blind source separation (the coctail party
    problem)
  • Image denoising
  • Medical signal processing fMRI, ECG, EEG
  • Modelling of the visual cortex
  • Feature extraction for face recognition
  • Compression, redundancy reduction
  • Clustering
  • Time series analysis

21
The Cocktail Party Problem
Original (hidden) sources
22
Observations Linear mixtures of the sources
23
Recovered sources
24
Overview
  • Branches of machine learning
  • Roadmap of unsupervised learning
  • Further examples applications

25
Galaxy spectra
  • Elliptical galaxies
  • oldest galactic systems
  • believed to consist of a single population of
    old stars
  • recent theories indicate the presence of younger
    populations of stars
  • what does the data tell us?

26
What does the data tell us?

  • A Kaban, L Nolan and S Raychaudhury. Finding
    Young Stellar Populations in Elliptical Galaxies
    from Independent Components of Optical Spectra.
    Proc. SIAM International Conference on Data
    Mining (SDM05), 2005.
  • LA Nolan, M Harva, A Kaban and S Raychaudhury, A
    data-driven Bayesian approach to finding young
    stellar populations in early-type galaxies from
    their UV-optical spectra, Mon. Not. of the Royal
    Astron. Soc. (MNRAS), 366(1), pp. 321-338.
  • LA Nolan, S Raychaudhury, A Kaban. Young stellar
    populations in early-type galaxies in the Sloan
    Digital Sky Survey, accepted to MNRAS.



27
(No Transcript)
28
Reference A Kaban X Wang ECML06
Finding communities from a dynamic network
  • An analogy Deconvolutive Source Separation (aka
    the coctail party problem)
  • Microphone(s) record a mixture of signals
  • Convolutive mixing due to echo
  • Task is to recover the individual signals
  • Studied in continuous signal processing

29
Computer-mediated discussion
  • Convolution occurs due to various time delay
    factors network transmission bandwidth,
    differences in speed of typing
  • The activity is logged as a sequence of discrete
    events
  • We try to model user participation dynamics based
    on this sequence

30
Dynamic Social Networks
Example of activity log from an IRC Chatroom
31
Results
Observed 1st order connectivity
Analysis by our Deconvolutive State Clustering
model
32
The clusters (comunities) developing over time
Note bursts of activity are being discovered
clusters
time
33
Scaling of our algorithm
34
Reference X Wang and A Kaban Proc. Discovery
Science 2006.
Model based inference of word saliency from text
Lowest saliency words from the 20-Newsgroups
corpus
35
Interpretation of a piece of text from
talk.politics.mideast underlined
saliencygt0.8 normal font saliencies between
0.40- 0.8 grey saliency lt0.4
36
Induced geometry of the topical and common word
distributions
?1
?
colour the estimated saliency
?2
37
Improved text classification
Data sets
Classification results
38
Modelling inhomogeneous documents
  • Multinomial model components
  • PLSA, LDA
  • Independent Bernoulli model components
  • Aspect Bernoulli

39
Reference Blei et al LDA. JMLR, 2003
40
What are the Bernoulli components?
Ref A Kaban E Bingham Proc. SDM03E. Bingham
A Kaban Submitted to JAIR
Word presences normally tend to have topical
causes. Word absences also have non-topical
causes (a noise factor) Note in terrorist
messages the word presences might have
non-topical causes too -)
41
What does the non-topical factor tell us?
  • Given a document, which are the word absences
    that have the posterior P(phantomn,t,xtn)
    highest (amongst all the posteriors of other
    factors)?
  • Equivalently, we can think of it as removing the
    noise factor.

42
(No Transcript)
43
A visual example Explaining each pixel
Example input data instances
Components identified from the data (Beta
posterior expectations)
Explanation of each pixel value of each data
instance in terms of how likely is it explained
by any of the components. Darker means higher
probability. Note that the white pixels in the
corners of a raster are explained by
content-bearing components whereas the occluded
pixels come from a phantom-component.
44
Predictive modelling of heterogeneous sequence
collections
Reference A. Kaban Predictive Modelling of
Heterogeneous Sequence Collections by Topographic
Ordering of Histories. Machine Learning (accepted
subject to minor revisions)
45
  • How to model heterogeneous behaviour?
  • Shared behavioural patterns (analogous to
    procedures of computer programs)
  • These are the basis of multiple relationships
    between users and groups of users
  • Existing models are either global or assume
    homogeneous prototypical behaviour within groups

46
Example
  • A set of 1-st order Markov Chains combine to
    generate sequences by interleaving in various
    proportions of participation
  • Task
  • Estimate shared generator-chains
  • Infer the proportions of participation

47
Prototypes vs aspects in the model
Can be shown that the model estimation algorithm
minimises a weighted sum of entropies of the
parameters.
48
Robust to sample size issues outperforms the
state-of- the art
49
A summary overview of the large sequence
collection in terms of lists of most probable
sequences at equal locations of the map
50
Instead of conclusions
All models are wrong but some are useful (Cox)
  • Data generated by natural processes typically
    contain redundancy
  • One can throw away a lot of detail from the data
    and still keep essential features
  • Simple models can be successful
  • Complicated models may need infeasible amounts of
    data to estimate

51
old stuff
  • The Latent Trait Model family as a general
    framework for data visualisation
  • Local geometric properties of LTMs
  • Hierarchical LTMs
  • Topographic visualisation of state evolution
  • Conclusions

1 Kabán, A and Girolami, M., A Combined Latent
Class and Trait Model for the Analysis and
Visualisation of Discrete Data..IEEE Transactions
on Pattern Analysis and Machine Intelligence,
23(8), pp. 859872, 2001.
52
The Generative Latent Trait Model Family
non-linear
Latent space (space of the hidden structure)
Observable multi-dimensional data space
Aim infer and visualise the structure of the
data as much as possible
53
Text based document representation modelling
54
(No Transcript)
55
  • The Latent Trait Model family as a general
    framework for data visualisation
  • Local Magnification Factors of the LT Manifolds
  • Hierarchical LTMs
  • Topographic visualisation of state evolution
  • Conclusions

56
Magnification factors and curvatures of the
Bernoulli Trait manifold on handwritten digits
data
57
  • Brief introduction to generative and latent
    variable models
  • Text generation models
  • The Latent Trait Model family as a general
    framework for data visualisation
  • Local geometric properties of the LT manifolds
  • Hierarchical LTMs
  • Topographic visualisation of state evolution
  • Conclusions

2 Kabán, A Tino, P. and Girolami, M., A General
Framework for a Principled Hierarchic
Visualisation of Multivariate Data, Proc.
IDEAL02, Manchester, August 2002,to appear.
58
Hierarchical Posterior Mean Mapping
59
Hierarchy of Local Magnification Factors
60
  • The Latent Trait Model family as a general
    framework for data visualisation
  • Local geometric properties of LTMs
  • Hierarchical LTMs
  • Topographic Visualisation of State Evolution in
    Temporally Coherent Data. Visualizing the topic
    evolution in coherent text streams
  • Conclusions

3 Kabán, A and Girolami, M., A Dynamic
Probabilistic Model to Visualize Topic Evolution
in Text Streams, Journal of Intelligent
Information Systems, special issue on Automated
Text Categorization Vol. 18 No 2. (March 2002
61
(No Transcript)
62
Chat line discussions from Internet relay chat
room          Posteriors in time during a
discussion concentrated around a single topic
(Susan Smith)      Posteriors in a time
frame containing a change of topic from general
politics to gun control    
63
2-D visualisation of a chat session -- summary
mapping of the posterior means
64
Future Challenges
  • The most important goal for theoretical computer
    science in 1950-2000 was to understand the von
    Neumann computer. The most important goal for
    theoretical computer science from 2000 onwards is
    to understand the Internet
  • Christos H. Papadimitriou
Write a Comment
User Comments (0)
About PowerShow.com