Classification Methods for Data Mining: Tasks, Issues

About This Presentation

Title:

Classification Methods for Data Mining: Tasks, Issues

Description:

SPOT Image of Calcutta in the Near Infra Red Band. Garden Reach Lake. Hooghly ... Classified SPOT image of Calcutta (zooming the race course R' only) using (a) ... – PowerPoint PPT presentation

Number of Views:1145

Avg rating:3.0/5.0

Slides: 73

Provided by: pc7562

Category:

more less

Transcript and Presenter's Notes

Title: Classification Methods for Data Mining: Tasks, Issues

1
Classification Methods for Data Mining Tasks,
Issues Challenges

Sankar K. Pal
Indian Statistical Institute
Calcutta
http//www.isical.ac.in/sankar

Contents
Pattern Recognition and Machine Intelligence
Relevance of FL, ANN and GAs
Different Integrations of Soft Computing Tools
Emergence of Data Mining
Need
KDD Process
Relevance of Soft Computing Tools with Examples
Rough Sets and Granular Computing
Information granules
Rough set rules

Rough-Fuzzy Case Generation
What is Case Based Reasoning?
Case Generation with Granules
Fuzzy Granulation
Mapping Rough Set Dependency Rules to Cases
Case Retrieval
Experimental Results and Merits
Other Applications
Rough Self-Organizing Map
Rough Cases with EM and MST for Multi-Spectral
Image Segmentation
Conclusions

4
Hybrid Systems

Neuro-fuzzy
Genetic neural
Rough fuzzy
Fuzzy neuro
genetic

Knowledge-based Systems

Probabilistic reasoning
Approximate reasoning
Case based reasoning

Data Driven Systems
Machine Intelligence

Neural network
system
Evolutionary
computing

Fuzzy logic
Rough sets

Non-linear Dynamics

Chaos theory
Rescaled range
analysis (wavelet)
Fractal analysis

Pattern recognition
and learning

Machine Intelligence A core concept for
grouping various advanced technologies with
Pattern Recognition and Learning
5
Pattern Recognition System (PRS)

Measurement ? Feature ? Decision
Space Space Space
Uncertainties arise from deficiencies of
information available from a situation
Deficiencies may result from incomplete,
imprecise, ill-defined, not fully reliable,
vague, contradictory information in various
stages of a PRS

6
M Height, Weight, Complexion, Diet.
D ? Classifier Design
7
(No Transcript)
8
Clustering
Mother
Father
Daughter
Son
9
Tasks Challenges

Classification Sampled data are given about the
pattern space And the Challenge is to estimate
the unknown regions of the pattern space based on
the sampled data (incomplete information)
Clustering Entire data is given And the
Challenge is to partition it into meaningful
regions. The number of regions may be known or
unknown

10
Clustering Some Points

More than 50 literature in PR research is
related to Clustering
Still unsolved and provides open problems
(e.g., Cluster validity, Indexes)
Acts like a basic module in decision-making and
machine learning problems particularly for mining
large data sets in unsupervised mode
(e,g., prototype selection, feature
selection, data condensation/compression)
Significance in Bioinformatics and Web data

11
Relevance of Fuzzy Sets in PR

Representing linguistically phrased input
features for processing

Representing multi-class membership of ambiguous
patterns

Generating rules inferences in
linguistic form

Extracting ill-defined image regions, primitives,
properties and describing relations among them as
fuzzy subsets

12
ANNs provide Natural Classifiers having

Resistance to Noise,
Tolerance to Distorted Patterns /Images (Ability
to Generalize)
Superior Ability to Recognize Overlapping Pattern
Classes or Classes with Highly Nonlinear
Boundaries or Partially Occluded or Degraded
Images

Potential for Parallel Processing
Non parametric

13
Why GAs in PR ?

Methods developed for Pattern Recognition and
Image Processing are usually problem dependent.
Many tasks involved in analyzing/identifying a
pattern need Appropriate Parameter Selection and
Efficient Search in complex spaces to obtain
Optimal Solutions

Makes the processes
- Computationally Intensive
- Possibility of Losing the Exact Solution

GAs Efficient, Adaptive and robust Search
Processes, Producing near optimal solutions and
have a large amount of Implicit Parallelism
GAs are Appropriate and Natural Choice for
problems which need Optimizing Computation
Requirements, and Robust, Fast and Close
Approximate Solutions

15
Role of GAs

Robust, parallel, adaptive search methods
suitable when the search space is large.
Used more in Prediction (P) than Description(D)
D Finding human interpretable patterns
describing the data
P Using some variables or attributes in the
database to predict unknown/ future values of
other variables of interest.

16
Example of GA based Classification

Automatic selection of no. of hyper planes for
approximating class boundaries for minimum
miss-classification (VGA classifier)
Chromosome (sexual) discrimination to reduce
computation time (VGACD classifier)
Robust Searching Ability (suitable when the
search space is large)

17
SPOT Image of Calcutta in the Near Infra Red Band
(spatial resolution 20m x 20m wavelength
0.79mm-0.89mm)
Garden Reach Lake
IEEE Trans. Geosci. Remote Sensing, 39(2),
303-308, 2001
Intl. J. Remote Sensing, 22(13), 2545-2569, 2001
18
Scatter plot of the training set of SPOT image of
Calcutta, containing seven classes.
19
(f)
(d)
(e)
Classified SPOT image of Calcutta (zooming the
race course R only) using (a)
VGACD-Classifier, Hmax15, final value of H13,
(b) VGA classifier, Hmax15, final value of H10,
(c) Bayes maximum likelihood Classifier, (d) k-NN
rule, k1, (e) k-NN rule, k3, (f) k-NN rule,
ksqrt(n).
IEEE Trans. Geosci. Remote Sensing 39(2),
303-308, 2001
20
IEEE Trans. Geosci. Remote Sensing 39(2),
303-308, 2001
Variation of the number of points misclassified
by the best Chromosome with generations for
VGACD classifier and VGA classifier
21
Fuzzy Logic, Neuro-computing and Genetic
Algorithms are the major components of what
is called Soft Computing where these tools
work synergistically
22
Role of Major Soft Computing Components
FL algorithms for dealing with imprecision and
uncertainty NC machinery for
learning and curve fitting GA algorithms for
search and optimization
handling uncertainty arising from the granularity
in the domain of discourse
23

Exploit the tolerance for
imprecision
uncertainty
approximate reasoning
partial truth
to achieve Tractability, Robustness, low cost
solution and close resemblance with human like
decision making
Provides Flexible Information Processing
Capability for representation and evaluation of
real life ambiguous/ uncertain situations.

It may be argued that it is soft computing
rather than hard computing that should be viewed
as the foundation for Artificial Intelligence.

Relevance of FL, ANN, GAs Individually
to PR Problems is Established

26
Integration of Soft Computing Tools
27
In late eighties scientists thought Why NOT
Integrations ?
Fuzzy Logic ANN ANN GA Fuzzy Logic ANN
GA Fuzzy Logic ANN GA Rough Set
Neuro-fuzzy hybridization is the most
visible integration realized so far.
28
Why Fusion
Fuzzy Set theoretic models try to mimic human
reasoning and the capability of handling
uncertainty (SW) Neural Network models attempt
to emulate architecture and information
representation scheme of human brain (HW)
NEURO-FUZZY Computing
(for More Intelligent System)
29
FUZZY SYSTEM
ANN used for learning and Adaptation
NFS
ANN
Fuzzy Sets used to Augment its Application
domain
FNN
30
Merits and Challenges

GENERIC
APPLICATION SPECIFIC

31
Rough-Fuzzy Hybridization

Fuzzy Set theory assigns to each object a degree
of belongingness (membership) to represent an
imprecise/vague concept.
The focus of rough set theory is on the
ambiguity
caused by limited discernibility of objects
(lower
and upper approximation of concept).

Rough sets and Fuzzy sets can be integrated to
develop a model of uncertainty stronger than
either.
32
Rough Fuzzy Hybridization A New Trend in
Decision Making, S. K. Pal and A. Skowron (eds),
Springer-Verlag, Singapore, 1999
33
Neuro-Rough Hybridization

Rough set models are used to generate network
parameters (weights).
Roughness is incorporated in inputs and output
of
networks for uncertainty handling, performance
enhancement and extended domain of application.
Networks consisting of rough neurons are used.

Neurocomputing, Spl. Issue on Rough-Neuro
Computing, S. K. Pal,
W. Pedrycz, A. Skowron and R. Swiniarsky (eds),
vol. 36 (1-4), 2001.

34
Challenges (e.g., RN and RF)

Improve performance
Reduce network learning time
Reduce network size (Compact Network)
Preserving identity of clusters irrespective of
their sizes
Stronger model of uncertainty handling
Reduce computation time

35
Example of Compact Network
Connectivity of the network obtained for
six-class vowel recognition using Modular Rough
Fuzzy MLP
36

Rough-Neural Computing Techniques for Computing
with Words, S.K. Pal, L. Polkowski and A. Skowron
(eds.), Springer, Heidelberg, 2003.

Neuro-Rough-Fuzzy-Genetic Hybridization
Rough sets are used to extract domain knowledge
in the form of linguistic rules
generates fuzzy Knowledge based networks
evolved using Genetic algorithms.
Integration offers several advantages like fast
training, compact network and performance
enhancement.

38
IEEE TNN, 9, 1203-1216, 1998
Incorporate Domain Knowledge using Rough Sets
39

Data Mining
Today PR activity remains incomplete without the
mention of its significance to DM
DM from Pattern Recognition Machine Learning
Perspectives
(DBMS, Statistical)

40
One of the applications of Information Technology
that has drawn the attention of researchers is
DATA MINING where Pattern Recognition/Image
Processing/Machine Intelligence are directly
related.
41
Why Data Mining ?
IEEE Trans. Neural Networks, 13(1), 3-14, 2002

Digital revolution has made digitized information
easy to capture and fairly inexpensive to store.
With the development of computer hardware and
software and the rapid computerization of
business, huge amount of data have been collected
and stored in centralized or distributed
databases.

Data is heterogeneous (mixture of text, symbolic,
numeric, texture, image), huge (both in
dimension and size) and scattered.
The rate at which such data is stored is growing
at a phenomenal rate.

As a result, traditional ad hoc mixtures of
statistical techniques and data management tools
are no longer adequate for analyzing this vast
collection of data.

Pattern Recognition and Machine Learning
principles applied to a very large (both in size
and dimension) heterogeneous database
? Data Mining
Data Mining Knowledge Interpretation
?
Knowledge Discovery
Process of identifying valid, novel, potentially
useful, and ultimately understandable patterns
in data

44
Pattern Recognition, World Scientific, 2001
Data Mining (DM)

Data
Cleaning

Machine Learning
Knowledge Interpretation

Data
Condensation

Mathe- matical Model of
Preprocessed
Useful
Huge Raw Data

Knowledge
Extraction
Knowledge
Evaluation

Dimensionality
Reduction

Knowledge

Classification
Clustering
Rule
Generation

Data
Data (Patterns)

Data
Wrapping/
Description

Knowledge Discovery in Database (KDD)
45
Data Mining Algorithm Components

Model Function of the model (e.g.,
classification, clustering, rule generation) and
its representational form (e.g., linear
discriminants, neural networks, fuzzy logic, GAs,
rough sets).
Preference criterion Basis for preference of
one model or set of parameters over another.
Search algorithm Specification of an algorithm
for finding particular patterns of interest (or
models and parameters), given the data, family of
models, and preference criterion.

46
Why Growth of Interest ?

Falling cost of large storage devices and
increasing ease of collecting data over networks.
Availability of Robust/Efficient machine learning
algorithms to process data.
Falling cost of computational power ? enabling
use of computationally intensive methods for data
analysis.

47
Applications

Financial Investment Dynamic huge data of stock
indices and prices, interest rates, credit card
information, fraud detection
Health Care Diverse diagnostic information
stored by hospital management systems
WWW Vast collection of uncontrolled, diverse
dynamic documents
Bio-informatics Heterogeneous data base of gene
sequence, protein structures, micro arrays, gene
expressions with imprecise/partial information
Data is heterogeneous (mixture of text, symbolic,
numeric, texture, image) and huge (both in
dimension and size)

48
Example Medical Data

Numeric and textual information may be
interspersed
Different symbols can be used with same meaning
Redundancy often exists
Erroneous/misspelled medical terms are common
Data is often sparsely distributed

49
Example Web Mining
Discovery/ analysis of useful information from WWW
Characteristics of web data

Unlabelled
Distributed
Heterogeneous (mixed media)
Semi-structured
Time varying
High dimensional
Web mining deals with large hyper-linked
information having these characteristics with
Interactive Medium (Human Interface)

50
Issues arising out of Human Interface

Need for handling context sensitive and imprecise
queries
Need for summarization and deduction
Need for personalization and learning
Web mining, though considered an application of
DM, warrants a separate field of research because
of these characteristics and human related issues

51
Example Human Genome Data

Laboratory operations on DNA inherently involve
errors
Heterogeneous data base of gene sequence, protein
structures, micro arrays, gene expressions
Partial/incomplete information

Robust preprocessing system is required to
extract any kind of knowledge
The data must not only be cleaned of errors and
redundancy, but organized in a fashion that makes
sense for the problem

So, We NEED
Efficient
Robust
Flexible
Machine Learning Algorithms
?
NEED for Soft Computing Paradigm

54
Role of Fuzzy Sets

Modeling of imprecise/qualitative
knowledge
Transmission and handling uncertainties at
various stages
Supporting, to an extent, human type
reasoning in natural form

Classification/ Clustering
Discovering association rules (describing
interesting association relationship among
different attributes)
Inferencing
Data summarization/condensation (abstracting the
essence from a large amount of information).

56
Role of ANN

Adaptivity, robustness, parallelism, optimality
Machinery for learning and curve fitting (Learns
from examples)
Initially, thought to be unsuitable for black
box nature no information available in symbolic
form (suitable for human interpretation)
Recently, embedded knowledge is extracted in the
form of symbolic rules making it
suitable for Rule generation.

57
IEEE Trans. Knowledge Data Engg., 15(1), 14-25,
2003
Example Modular Rough-Fuzzy Evolutionary MLP

Enhances
Classification Performance
Training time
Network compactness
Generates Rules of
Higher accuracy
Smaller size
Less confusion

58
Knowledge Flow in Modular Rough Fuzzy MLP
IEEE Trans. Knowledge Data Engg., 15(1), 14-25,
2003
Feature Space
Rough Set Rules
C1
(R1)
Network Mapping
C1
F2
C2(R2)
C2(R3)
F1
R1 (Subnet 1)
R2 (Subnet 2)
R3 (Subnet 3)
Partial Training with Ordinary GA
Feature Space
SN1
(SN2)
(SN1)
(SN3)
F2
SN2
Partially Refined Subnetworks
SN3
F1
59
Concatenation of Subnetworks
high mutation prob.
low mutation prob.
Evolution of the Population of Concatenated
networks with GA having variable mutation operator
Feature Space
C1
F2
C2
Final Solution Network
F1
60
Vowel Data
61
Speech Data 3 Features, 6 Classes
Classification Accuracy
62
Training Time (hrs) DEC Alpha
Workstation _at_400MHz
63
Network Size (No. of Links)
64
1. MLP 4.
Rough Fuzzy MLP 2. Fuzzy MLP
5. Modular Rough Fuzzy MLP 3. Modular
Fuzzy MLP
Results for Speech data
65
Connectivity of the network obtained using
Modular Rough Fuzzy MLP
66
Without Soft Computing Machine Intelligence
and Data Mining Research Remains Incomplete.
67
Rough Sets and Granular Computing
68

Rough Sets
Offer math tools to discover hidden patterns in
data
Offer learning systems to discover redundancies
and dependencies between the given features of
data
Approximate a given concept both from below and
from above, using lower and upper approximations
Offer learning algorithms to obtain rules in
IF-THEN form from a decision table w.r.t. objects
and attributes
Extract Knowledge from data base (decision table
? remove undesirable attributes ? analyze data
dependency ? minimum subset of attributes
(reducts))

69
Z. Pawlak 1982, Int. J. Comp. Inf. Sci
Rough Sets
Upper Approximation BX
Set X
Lower Approximation BX
xB (Granules)
.
x
xB set of all points belonging to the same
granule as of the point x
in feature space WB.
xB is the set of all points which are
indiscernible with point x in terms of feature
subset B.
70
Approximations of the set
w.r.t feature subset B
B-lower BX
Granules definitely belonging to X
B-upper BX
Granules definitely and possibly belonging to X
If BX BX, X is B-exact or B-definable Otherwise
it is Roughly definable
Rough Sets are Crisp Sets, but with rough
description
71
Rough Sets
Uncertainty Handling
Granular Computing
(Using information granules)
(Using lower upper approximations)
72
Information Granules A group of similar objects
clubbed together by an indiscernibility
relation Granular Computing Computation is
performed using information granules and not the
data points (objects)
Information compression Computational gain
73
Information Granules and Rough Rules
F2
high
medium
low
low
medium
high
F1
Rule

Rule provides crude description of the class
using
granule

Note
For non-convex clusters, there would be more than
one granule or rough rule to represent it crudely
Unsupervised No. of granules is determined
automatically
Granules/ rules may be viewed as Cases
All features may not occur in a rule
Cases may be represented by Different Reduced
number of features.

Case Selection ? Cases belong to the set of
examples encountered.
Case Generation ? Constructed Cases need not be
any of the examples.

76
Granular Computing and Case Generation

Cases Informative patterns (prototypes)
characterizing the problems.
In rough set theoretic framework
Cases ? Information Granules
In rough-fuzzy framework
Cases ? Fuzzy Information Granules

77
Case Generation Characteristics and Merits

Cases are cluster granules, not sample points
Involves only reduced number of relevant
features with variable size
Less storage requirements
Fast retrieval
Suitable for mining data with large dimension
and size

78
Fuzzy (F)-Granulation
mlow
mmedium
mhigh
1
Membership value
0.5
cM
cH
cL
Feature j
lL
lM
p-function
79
Example IEEE Trans.
Knowledge Data Engg., 16(3), 292, 2004
F2
CASE 1
0.9
Note All features may not occur in a rule
0.4
X X X X X X X X X

CASE 2
0.2
0.7
0.1
0.5
F1
Parameters of fuzzy linguistic sets low, medium,
high
80
Case Retrieval

Similarity (sim(x,c)) between a pattern x and a
case c is defined as
n number of features present in case c

81
Iris Flowers 4 features, 3 classes, 150 samples
Number of cases 3 (for all methods)
82
Forest Cover Types 10 features, 7 classes,
5,86,012 samples
Number of cases 545 (for all methods), GIS
(cartographic RS measurements)
83
Hand Written Numerals 649 features, 10 classes,
2000 samples
Number of cases 50 (for all methods),
Collection of Dutch Utility Map
84
Applications of Rough Granules

Case Based Reasoning (evident is sparse)
Prototype generation and class representation
Clustering Image segmentation (k selected
autom)
Case representation and indexing
Knowledge encoding
Dimensionality reduction
Data compression and storing
Granular information retrieval

85
Certain Issues

Selection of granules and sizes
Fuzzy granules
Granular fuzzy computing
Fuzzy granular computing

86
Rough Set Knowledge Encoding, EM MST for
Multi-spectral Image Segmentation
87
EM Algorithm

Handles uncertainty out of overlapping classes
Number of clusters (k) needs to be known
Solution depends strongly on initial conditions
Models only convex clusters

Minimal Spanning Tree (MST) Clustering
Can model Non-convex clusters, but time consuming

Rough Set Theoretic Knowledge Encoding

Automatically determines the number of clusters
k
Provides good initialization
(avoidance of local minima, fast convergence)
Granular computing

RS Knowledge Encoding EM MST
88
Band 1
Band 2
Intl. mixture model param.
Refined mixt. model param.
Final Clusters
Granulated n dimen. image space
Gray-level thresholding of individual bands
Segmented Multi-spectral Image
Band 3

Mapping Rules to Distribution Parameters
EM
MST
Band n
Rule Generation
Input Multi-spectral Image Bands
89
Multi-Spectral IRS Image of Calcutta
(Spatial resolution 36.25 m X 36.25 m,
wavelengths 0.77-0.86mm)
Band 2
Band 1
Band 3
Band 4
90
Quantitative Index b Measuring Segmentation
Quality (IRS-1A image of Calcutta, No. of
bands 4 )
Final no. of clusters (land cover type) 5
EM/KM Random initialization EM/K-means, REM/RKM
Rough set theoretic initialization
EM/K-means, KMEM K-means initialization EM,
EMMST Random init. EM MST FKM Fuzzy
K-means, REMMST Rough set init. EM MST
91
Computation Time (seconds)
92
Segmented image of Calcutta using EM algorithm
with random initialization (EM) b
5.91, No. of Clusters 5
93
Segmented image of Calcutta using EM algorithm
with Rough set theoretic initialization and MST
clustering (REMST) b 7.37, No. of Clusters
5
94

Related Subsequent Work
Unsupervised case generation Rough-SOM
(Applied Intelligence, 21(3), 289-299,
2004)
Application to multi-spectral image segmentation
(IEEE Trans. Geoscience and Remote Sensing,
40(11), 2495-2501, 2002)
Rough case-based reasoner for text categorization
(Int. J. Approx. Reasoning, 41, 229-255,
2006)
Building CBR classifiers combining both feature
reduction and case selection
(IEEE Trans. Knowledge and Data Engg.,
18(3), 415-429, 2006)
Bioinformatics in Neurocomputing Framework
(IEE Proc. Circuits, Devices and Systems,
152 (5), 556-564, 2005)
Evolutionary computation in Bioinformatics A
Review
(IEEE Trans. Syst., Man and Cyberns. Part C,
36(5), 601-615, 2006)
Rough-fuzzy c-medoids algorithms and selection of
biobasis for amino acid sequence analysis
(IEEE Trans. Knowledge Data Engg., 19(6),
859-872, 2007)

95
Some Challenges of Data Mining

Multimedia mining and retrieval, that involves
simultaneous manipulation of heterogeneous data
like text, image, audio, video, etc.
Data stream mining, for handling a sequence of
digitally encoded coherent signals that is in
transmission. This has implication to the
Internet service providers.
Biological data mining, encompassing sequence,
structure and high-dimensional data.
Scalability issues
Real time processing of time dependent data
stream
Ensembling for distributed data mining (modular
approach), including classification and
clustering.
Quantitative Indices
CTP Computational Theory of Perception

96
S K Pal and S C K Shiu, Foundations of Soft
Case-Based Reasoning, Wiley, N.Y., 2004
97
SK Pal and P Mitra, Pattern Recognition
Algorithms for Data Mining, CRC/ Chapman Hall,
Florida, 2004
98

S Bandyopadhyay and S K Pal,
Classification and Learning Using
Genetic Algorithms Applications in
Bioinformatics and Web Intelligence, Springer,
Heidelberg, 2007

99
About the Soft Computing Center at ISI
http//www.isical.ac.in/scc
100
Objectives

The center will focus mainly on basic research
and, to some extent, on manpower development
keeping in mind that the research excellence is
the main objective. The activities of the center
will include
(a) conducting basic research in pattern
recognition, image processing, computer vision,
neural networks, genetic algorithms, wavelets,
support vector machines, data mining, hybrid
techniques, rough sets, video image processing,
fractals etc.
(b) demonstrating applications to some focused
areas like web mining (e.g., page ranking,
personalization etc.), bioinformatics (e.g.,
protein structure analysis), medical image (e.g.,
ultrasonographic and MRI) analysis, and VLSI
layout design, to be decided time to time,

101

(c) developing manpower (i) imparting training
to researchers/students from industry and
academia including RD labs (ii) disseminating
teaching and training material for distance
education using multimedia and video facilities
and (iii) offering regular short term advanced
courses on upcoming research areas,
(d) organizing seminars/workshops/schools by
eminent faculty from abroad and India
(e) providing a forum of exchanging ideas or
establishing a linkage among scientists of
leading institutions and industry working in
similar areas by inviting interested faculty/
research personnel,
(f) providing fellowships for helping faculty and
scholars from less endowed institutions,
especially from neighboring regions.

102
Collaboration with CIMPA, France

International Center for Pure and Applied
Mathematics (CIMPA), France has recently
partially supported an International Workshop on
Soft Computing Approaches to Pattern Recognition
and Image Processing organized by the Machine
Intelligence Unit of ISI. They have promised to
support similar endeavors of the center in future
by providing travel support to foreign delegates.

103
Mechanism for collaborative projects

Since research excellence is the main object of
the center, the collaborative projects would
focus mainly on research.
At least one investigator of the center or a
faculty of ISI deputed to the center would be
involved in such a project.
Merits of the project proposals, routed through
proper channel, will be evaluated.
Infrastructural facilities will be provided by
the center.
Travel expense and local hospitality (in the form
of fellowship) of the visitors will be borne by
the center.
Less endowed Institutes will be given due
preference.