Title: Classification Methods for Data Mining: Tasks, Issues
1Classification Methods for Data Mining Tasks,
Issues Challenges
- Sankar K. Pal
- Indian Statistical Institute
- Calcutta
- http//www.isical.ac.in/sankar
2- Contents
- Pattern Recognition and Machine Intelligence
- Relevance of FL, ANN and GAs
- Different Integrations of Soft Computing Tools
- Emergence of Data Mining
- Need
- KDD Process
- Relevance of Soft Computing Tools with Examples
- Rough Sets and Granular Computing
- Information granules
- Rough set rules
3- Rough-Fuzzy Case Generation
- What is Case Based Reasoning?
- Case Generation with Granules
- Fuzzy Granulation
- Mapping Rough Set Dependency Rules to Cases
- Case Retrieval
- Experimental Results and Merits
- Other Applications
- Rough Self-Organizing Map
- Rough Cases with EM and MST for Multi-Spectral
Image Segmentation - Conclusions
4Hybrid Systems
- Neuro-fuzzy
- Genetic neural
- Rough fuzzy
- Fuzzy neuro
- genetic
Knowledge-based Systems
- Probabilistic reasoning
- Approximate reasoning
- Case based reasoning
Data Driven Systems
Machine Intelligence
- Neural network
- system
- Evolutionary
- computing
Non-linear Dynamics
- Chaos theory
- Rescaled range
- analysis (wavelet)
- Fractal analysis
- Pattern recognition
- and learning
Machine Intelligence A core concept for
grouping various advanced technologies with
Pattern Recognition and Learning
5Pattern Recognition System (PRS)
- Measurement ? Feature ? Decision
- Space Space Space
- Uncertainties arise from deficiencies of
information available from a situation - Deficiencies may result from incomplete,
imprecise, ill-defined, not fully reliable,
vague, contradictory information in various
stages of a PRS
6M Height, Weight, Complexion, Diet.
D ? Classifier Design
7(No Transcript)
8Clustering
Mother
Father
Daughter
Son
9Tasks Challenges
- Classification Sampled data are given about the
pattern space And the Challenge is to estimate
the unknown regions of the pattern space based on
the sampled data (incomplete information) - Clustering Entire data is given And the
Challenge is to partition it into meaningful
regions. The number of regions may be known or
unknown
10Clustering Some Points
- More than 50 literature in PR research is
related to Clustering - Still unsolved and provides open problems
- (e.g., Cluster validity, Indexes)
- Acts like a basic module in decision-making and
machine learning problems particularly for mining
large data sets in unsupervised mode - (e,g., prototype selection, feature
selection, data condensation/compression) - Significance in Bioinformatics and Web data
11Relevance of Fuzzy Sets in PR
- Representing linguistically phrased input
features for processing
- Representing multi-class membership of ambiguous
patterns -
- Generating rules inferences in
- linguistic form
-
- Extracting ill-defined image regions, primitives,
properties and describing relations among them as
fuzzy subsets
12ANNs provide Natural Classifiers having
- Resistance to Noise,
- Tolerance to Distorted Patterns /Images (Ability
to Generalize) - Superior Ability to Recognize Overlapping Pattern
Classes or Classes with Highly Nonlinear
Boundaries or Partially Occluded or Degraded
Images
- Potential for Parallel Processing
- Non parametric
13Why GAs in PR ?
- Methods developed for Pattern Recognition and
Image Processing are usually problem dependent. - Many tasks involved in analyzing/identifying a
pattern need Appropriate Parameter Selection and
Efficient Search in complex spaces to obtain
Optimal Solutions
- Makes the processes
- - Computationally Intensive
- - Possibility of Losing the Exact Solution
14- GAs Efficient, Adaptive and robust Search
Processes, Producing near optimal solutions and
have a large amount of Implicit Parallelism - GAs are Appropriate and Natural Choice for
problems which need Optimizing Computation
Requirements, and Robust, Fast and Close
Approximate Solutions
15Role of GAs
- Robust, parallel, adaptive search methods
suitable when the search space is large. - Used more in Prediction (P) than Description(D)
- D Finding human interpretable patterns
describing the data - P Using some variables or attributes in the
database to predict unknown/ future values of - other variables of interest.
16Example of GA based Classification
- Automatic selection of no. of hyper planes for
approximating class boundaries for minimum
miss-classification (VGA classifier) - Chromosome (sexual) discrimination to reduce
computation time (VGACD classifier) - Robust Searching Ability (suitable when the
search space is large)
17SPOT Image of Calcutta in the Near Infra Red Band
(spatial resolution 20m x 20m wavelength
0.79mm-0.89mm)
Garden Reach Lake
IEEE Trans. Geosci. Remote Sensing, 39(2),
303-308, 2001
Intl. J. Remote Sensing, 22(13), 2545-2569, 2001
18Scatter plot of the training set of SPOT image of
Calcutta, containing seven classes.
19(f)
(d)
(e)
Classified SPOT image of Calcutta (zooming the
race course R only) using (a)
VGACD-Classifier, Hmax15, final value of H13,
(b) VGA classifier, Hmax15, final value of H10,
(c) Bayes maximum likelihood Classifier, (d) k-NN
rule, k1, (e) k-NN rule, k3, (f) k-NN rule,
ksqrt(n).
IEEE Trans. Geosci. Remote Sensing 39(2),
303-308, 2001
20IEEE Trans. Geosci. Remote Sensing 39(2),
303-308, 2001
Variation of the number of points misclassified
by the best Chromosome with generations for
VGACD classifier and VGA classifier
21 Fuzzy Logic, Neuro-computing and Genetic
Algorithms are the major components of what
is called Soft Computing where these tools
work synergistically
22Role of Major Soft Computing Components
FL algorithms for dealing with imprecision and
uncertainty NC machinery for
learning and curve fitting GA algorithms for
search and optimization
handling uncertainty arising from the granularity
in the domain of discourse
23- Exploit the tolerance for
- imprecision
- uncertainty
- approximate reasoning
- partial truth
- to achieve Tractability, Robustness, low cost
solution and close resemblance with human like
decision making - Provides Flexible Information Processing
Capability for representation and evaluation of
real life ambiguous/ uncertain situations.
24-
- It may be argued that it is soft computing
rather than hard computing that should be viewed
as the foundation for Artificial Intelligence.
25- Relevance of FL, ANN, GAs Individually
- to PR Problems is Established
26Integration of Soft Computing Tools
27In late eighties scientists thought Why NOT
Integrations ?
Fuzzy Logic ANN ANN GA Fuzzy Logic ANN
GA Fuzzy Logic ANN GA Rough Set
Neuro-fuzzy hybridization is the most
visible integration realized so far.
28Why Fusion
Fuzzy Set theoretic models try to mimic human
reasoning and the capability of handling
uncertainty (SW) Neural Network models attempt
to emulate architecture and information
representation scheme of human brain (HW)
NEURO-FUZZY Computing
(for More Intelligent System)
29FUZZY SYSTEM
ANN used for learning and Adaptation
NFS
ANN
Fuzzy Sets used to Augment its Application
domain
FNN
30Merits and Challenges
- GENERIC
-
- APPLICATION SPECIFIC
31Rough-Fuzzy Hybridization
- Fuzzy Set theory assigns to each object a degree
- of belongingness (membership) to represent an
- imprecise/vague concept.
- The focus of rough set theory is on the
ambiguity - caused by limited discernibility of objects
(lower - and upper approximation of concept).
Rough sets and Fuzzy sets can be integrated to
develop a model of uncertainty stronger than
either.
32Rough Fuzzy Hybridization A New Trend in
Decision Making, S. K. Pal and A. Skowron (eds),
Springer-Verlag, Singapore, 1999
33Neuro-Rough Hybridization
- Rough set models are used to generate network
- parameters (weights).
- Roughness is incorporated in inputs and output
of - networks for uncertainty handling, performance
- enhancement and extended domain of application.
- Networks consisting of rough neurons are used.
- Neurocomputing, Spl. Issue on Rough-Neuro
Computing, S. K. Pal, - W. Pedrycz, A. Skowron and R. Swiniarsky (eds),
vol. 36 (1-4), 2001.
34Challenges (e.g., RN and RF)
- Improve performance
- Reduce network learning time
- Reduce network size (Compact Network)
- Preserving identity of clusters irrespective of
their sizes - Stronger model of uncertainty handling
- Reduce computation time
35Example of Compact Network
Connectivity of the network obtained for
six-class vowel recognition using Modular Rough
Fuzzy MLP
36- Rough-Neural Computing Techniques for Computing
with Words, S.K. Pal, L. Polkowski and A. Skowron
(eds.), Springer, Heidelberg, 2003.
37- Neuro-Rough-Fuzzy-Genetic Hybridization
- Rough sets are used to extract domain knowledge
in the form of linguistic rules
generates fuzzy Knowledge based networks
evolved using Genetic algorithms. - Integration offers several advantages like fast
training, compact network and performance
enhancement.
38IEEE TNN, 9, 1203-1216, 1998
Incorporate Domain Knowledge using Rough Sets
39- Data Mining
- Today PR activity remains incomplete without the
mention of its significance to DM - DM from Pattern Recognition Machine Learning
Perspectives - (DBMS, Statistical)
40One of the applications of Information Technology
that has drawn the attention of researchers is
DATA MINING where Pattern Recognition/Image
Processing/Machine Intelligence are directly
related.
41Why Data Mining ?
IEEE Trans. Neural Networks, 13(1), 3-14, 2002
- Digital revolution has made digitized information
easy to capture and fairly inexpensive to store. - With the development of computer hardware and
software and the rapid computerization of
business, huge amount of data have been collected
and stored in centralized or distributed
databases.
- Data is heterogeneous (mixture of text, symbolic,
numeric, texture, image), huge (both in
dimension and size) and scattered. - The rate at which such data is stored is growing
at a phenomenal rate.
42- As a result, traditional ad hoc mixtures of
statistical techniques and data management tools
are no longer adequate for analyzing this vast
collection of data.
43- Pattern Recognition and Machine Learning
- principles applied to a very large (both in size
- and dimension) heterogeneous database
- ? Data Mining
- Data Mining Knowledge Interpretation
- ?
Knowledge Discovery - Process of identifying valid, novel, potentially
- useful, and ultimately understandable patterns
- in data
44Pattern Recognition, World Scientific, 2001
Data Mining (DM)
Machine Learning
Knowledge Interpretation
Mathe- matical Model of
Preprocessed
Useful
Huge Raw Data
- Knowledge
- Extraction
- Knowledge
- Evaluation
Knowledge
- Classification
- Clustering
- Rule
- Generation
Data
Data (Patterns)
- Data
- Wrapping/
- Description
Knowledge Discovery in Database (KDD)
45Data Mining Algorithm Components
- Model Function of the model (e.g.,
classification, clustering, rule generation) and
its representational form (e.g., linear
discriminants, neural networks, fuzzy logic, GAs,
rough sets). - Preference criterion Basis for preference of
one model or set of parameters over another. - Search algorithm Specification of an algorithm
for finding particular patterns of interest (or
models and parameters), given the data, family of
models, and preference criterion.
46Why Growth of Interest ?
- Falling cost of large storage devices and
increasing ease of collecting data over networks. - Availability of Robust/Efficient machine learning
algorithms to process data. - Falling cost of computational power ? enabling
use of computationally intensive methods for data
analysis.
47Applications
- Financial Investment Dynamic huge data of stock
indices and prices, interest rates, credit card
information, fraud detection - Health Care Diverse diagnostic information
stored by hospital management systems - WWW Vast collection of uncontrolled, diverse
dynamic documents - Bio-informatics Heterogeneous data base of gene
sequence, protein structures, micro arrays, gene
expressions with imprecise/partial information - Data is heterogeneous (mixture of text, symbolic,
numeric, texture, image) and huge (both in
dimension and size)
48Example Medical Data
- Numeric and textual information may be
interspersed - Different symbols can be used with same meaning
- Redundancy often exists
- Erroneous/misspelled medical terms are common
- Data is often sparsely distributed
49Example Web Mining
Discovery/ analysis of useful information from WWW
Characteristics of web data
- Unlabelled
- Distributed
- Heterogeneous (mixed media)
- Semi-structured
- Time varying
- High dimensional
- Web mining deals with large hyper-linked
information having these characteristics with
Interactive Medium (Human Interface)
50Issues arising out of Human Interface
- Need for handling context sensitive and imprecise
queries - Need for summarization and deduction
- Need for personalization and learning
- Web mining, though considered an application of
DM, warrants a separate field of research because
of these characteristics and human related issues
51Example Human Genome Data
- Laboratory operations on DNA inherently involve
errors - Heterogeneous data base of gene sequence, protein
structures, micro arrays, gene expressions - Partial/incomplete information
52- Robust preprocessing system is required to
extract any kind of knowledge - The data must not only be cleaned of errors and
redundancy, but organized in a fashion that makes
sense for the problem
53- So, We NEED
- Efficient
- Robust
- Flexible
- Machine Learning Algorithms
- ?
- NEED for Soft Computing Paradigm
54Role of Fuzzy Sets
- Modeling of imprecise/qualitative
knowledge - Transmission and handling uncertainties at
various stages - Supporting, to an extent, human type
- reasoning in natural form
55- Classification/ Clustering
- Discovering association rules (describing
interesting association relationship among
different attributes) - Inferencing
- Data summarization/condensation (abstracting the
essence from a large amount of information).
56Role of ANN
- Adaptivity, robustness, parallelism, optimality
- Machinery for learning and curve fitting (Learns
from examples) - Initially, thought to be unsuitable for black
box nature no information available in symbolic
form (suitable for human interpretation) - Recently, embedded knowledge is extracted in the
form of symbolic rules making it
suitable for Rule generation.
57IEEE Trans. Knowledge Data Engg., 15(1), 14-25,
2003
Example Modular Rough-Fuzzy Evolutionary MLP
- Enhances
- Classification Performance
- Training time
- Network compactness
- Generates Rules of
- Higher accuracy
- Smaller size
- Less confusion
58Knowledge Flow in Modular Rough Fuzzy MLP
IEEE Trans. Knowledge Data Engg., 15(1), 14-25,
2003
Feature Space
Rough Set Rules
C1
(R1)
Network Mapping
C1
F2
C2(R2)
C2(R3)
F1
R1 (Subnet 1)
R2 (Subnet 2)
R3 (Subnet 3)
Partial Training with Ordinary GA
Feature Space
SN1
(SN2)
(SN1)
(SN3)
F2
SN2
Partially Refined Subnetworks
SN3
F1
59Concatenation of Subnetworks
high mutation prob.
low mutation prob.
Evolution of the Population of Concatenated
networks with GA having variable mutation operator
Feature Space
C1
F2
C2
Final Solution Network
F1
60Vowel Data
61Speech Data 3 Features, 6 Classes
Classification Accuracy
62 Training Time (hrs) DEC Alpha
Workstation _at_400MHz
63Network Size (No. of Links)
641. MLP 4.
Rough Fuzzy MLP 2. Fuzzy MLP
5. Modular Rough Fuzzy MLP 3. Modular
Fuzzy MLP
Results for Speech data
65Connectivity of the network obtained using
Modular Rough Fuzzy MLP
66 Without Soft Computing Machine Intelligence
and Data Mining Research Remains Incomplete.
67Rough Sets and Granular Computing
68- Rough Sets
- Offer math tools to discover hidden patterns in
data - Offer learning systems to discover redundancies
and dependencies between the given features of
data - Approximate a given concept both from below and
from above, using lower and upper approximations - Offer learning algorithms to obtain rules in
IF-THEN form from a decision table w.r.t. objects
and attributes - Extract Knowledge from data base (decision table
? remove undesirable attributes ? analyze data
dependency ? minimum subset of attributes
(reducts)) -
69Z. Pawlak 1982, Int. J. Comp. Inf. Sci
Rough Sets
Upper Approximation BX
Set X
Lower Approximation BX
xB (Granules)
.
x
xB set of all points belonging to the same
granule as of the point x
in feature space WB.
xB is the set of all points which are
indiscernible with point x in terms of feature
subset B.
70Approximations of the set
w.r.t feature subset B
B-lower BX
Granules definitely belonging to X
B-upper BX
Granules definitely and possibly belonging to X
If BX BX, X is B-exact or B-definable Otherwise
it is Roughly definable
Rough Sets are Crisp Sets, but with rough
description
71Rough Sets
Uncertainty Handling
Granular Computing
(Using information granules)
(Using lower upper approximations)
72Information Granules A group of similar objects
clubbed together by an indiscernibility
relation Granular Computing Computation is
performed using information granules and not the
data points (objects)
Information compression Computational gain
73Information Granules and Rough Rules
F2
high
medium
low
low
medium
high
F1
Rule
- Rule provides crude description of the class
using - granule
74- Note
- For non-convex clusters, there would be more than
one granule or rough rule to represent it crudely - Unsupervised No. of granules is determined
automatically - Granules/ rules may be viewed as Cases
- All features may not occur in a rule
- Cases may be represented by Different Reduced
number of features.
75- Case Selection ? Cases belong to the set of
examples encountered. - Case Generation ? Constructed Cases need not be
any of the examples.
76Granular Computing and Case Generation
- Cases Informative patterns (prototypes)
characterizing the problems. - In rough set theoretic framework
- Cases ? Information Granules
- In rough-fuzzy framework
- Cases ? Fuzzy Information Granules
77Case Generation Characteristics and Merits
- Cases are cluster granules, not sample points
- Involves only reduced number of relevant
- features with variable size
- Less storage requirements
- Fast retrieval
- Suitable for mining data with large dimension
and size
78Fuzzy (F)-Granulation
mlow
mmedium
mhigh
1
Membership value
0.5
cM
cH
cL
Feature j
lL
lM
p-function
79Example IEEE Trans.
Knowledge Data Engg., 16(3), 292, 2004
F2
CASE 1
0.9
Note All features may not occur in a rule
0.4
X X X X X X X X X
CASE 2
0.2
0.7
0.1
0.5
F1
Parameters of fuzzy linguistic sets low, medium,
high
80Case Retrieval
- Similarity (sim(x,c)) between a pattern x and a
case c is defined as - n number of features present in case c
81Iris Flowers 4 features, 3 classes, 150 samples
Number of cases 3 (for all methods)
82Forest Cover Types 10 features, 7 classes,
5,86,012 samples
Number of cases 545 (for all methods), GIS
(cartographic RS measurements)
83Hand Written Numerals 649 features, 10 classes,
2000 samples
Number of cases 50 (for all methods),
Collection of Dutch Utility Map
84Applications of Rough Granules
- Case Based Reasoning (evident is sparse)
- Prototype generation and class representation
- Clustering Image segmentation (k selected
autom) - Case representation and indexing
- Knowledge encoding
- Dimensionality reduction
- Data compression and storing
- Granular information retrieval
85Certain Issues
- Selection of granules and sizes
- Fuzzy granules
- Granular fuzzy computing
- Fuzzy granular computing
86Rough Set Knowledge Encoding, EM MST for
Multi-spectral Image Segmentation
87EM Algorithm
- Handles uncertainty out of overlapping classes
- Number of clusters (k) needs to be known
- Solution depends strongly on initial conditions
- Models only convex clusters
- Minimal Spanning Tree (MST) Clustering
- Can model Non-convex clusters, but time consuming
Rough Set Theoretic Knowledge Encoding
- Automatically determines the number of clusters
k - Provides good initialization
- (avoidance of local minima, fast convergence)
- Granular computing
RS Knowledge Encoding EM MST
88Band 1
Band 2
Intl. mixture model param.
Refined mixt. model param.
Final Clusters
Granulated n dimen. image space
Gray-level thresholding of individual bands
Segmented Multi-spectral Image
Band 3
Mapping Rules to Distribution Parameters
EM
MST
Band n
Rule Generation
Input Multi-spectral Image Bands
89Multi-Spectral IRS Image of Calcutta
(Spatial resolution 36.25 m X 36.25 m,
wavelengths 0.77-0.86mm)
Band 2
Band 1
Band 3
Band 4
90Quantitative Index b Measuring Segmentation
Quality (IRS-1A image of Calcutta, No. of
bands 4 )
Final no. of clusters (land cover type) 5
EM/KM Random initialization EM/K-means, REM/RKM
Rough set theoretic initialization
EM/K-means, KMEM K-means initialization EM,
EMMST Random init. EM MST FKM Fuzzy
K-means, REMMST Rough set init. EM MST
91Computation Time (seconds)
92Segmented image of Calcutta using EM algorithm
with random initialization (EM) b
5.91, No. of Clusters 5
93Segmented image of Calcutta using EM algorithm
with Rough set theoretic initialization and MST
clustering (REMST) b 7.37, No. of Clusters
5
94- Related Subsequent Work
- Unsupervised case generation Rough-SOM
- (Applied Intelligence, 21(3), 289-299,
2004) - Application to multi-spectral image segmentation
- (IEEE Trans. Geoscience and Remote Sensing,
40(11), 2495-2501, 2002) - Rough case-based reasoner for text categorization
- (Int. J. Approx. Reasoning, 41, 229-255,
2006) - Building CBR classifiers combining both feature
reduction and case selection - (IEEE Trans. Knowledge and Data Engg.,
18(3), 415-429, 2006) - Bioinformatics in Neurocomputing Framework
- (IEE Proc. Circuits, Devices and Systems,
152 (5), 556-564, 2005) - Evolutionary computation in Bioinformatics A
Review - (IEEE Trans. Syst., Man and Cyberns. Part C,
36(5), 601-615, 2006) - Rough-fuzzy c-medoids algorithms and selection of
biobasis for amino acid sequence analysis - (IEEE Trans. Knowledge Data Engg., 19(6),
859-872, 2007)
95Some Challenges of Data Mining
- Multimedia mining and retrieval, that involves
simultaneous manipulation of heterogeneous data
like text, image, audio, video, etc. - Data stream mining, for handling a sequence of
digitally encoded coherent signals that is in
transmission. This has implication to the
Internet service providers. - Biological data mining, encompassing sequence,
structure and high-dimensional data. - Scalability issues
- Real time processing of time dependent data
stream - Ensembling for distributed data mining (modular
approach), including classification and
clustering. - Quantitative Indices
- CTP Computational Theory of Perception
96S K Pal and S C K Shiu, Foundations of Soft
Case-Based Reasoning, Wiley, N.Y., 2004
97SK Pal and P Mitra, Pattern Recognition
Algorithms for Data Mining, CRC/ Chapman Hall,
Florida, 2004
98- S Bandyopadhyay and S K Pal,
- Classification and Learning Using
- Genetic Algorithms Applications in
Bioinformatics and Web Intelligence, Springer,
Heidelberg, 2007
99About the Soft Computing Center at ISI
http//www.isical.ac.in/scc
100Objectives
- The center will focus mainly on basic research
and, to some extent, on manpower development
keeping in mind that the research excellence is
the main objective. The activities of the center
will include - (a) conducting basic research in pattern
recognition, image processing, computer vision,
neural networks, genetic algorithms, wavelets,
support vector machines, data mining, hybrid
techniques, rough sets, video image processing,
fractals etc. - (b) demonstrating applications to some focused
areas like web mining (e.g., page ranking,
personalization etc.), bioinformatics (e.g.,
protein structure analysis), medical image (e.g.,
ultrasonographic and MRI) analysis, and VLSI
layout design, to be decided time to time,
101- (c) developing manpower (i) imparting training
to researchers/students from industry and
academia including RD labs (ii) disseminating
teaching and training material for distance
education using multimedia and video facilities
and (iii) offering regular short term advanced
courses on upcoming research areas, - (d) organizing seminars/workshops/schools by
eminent faculty from abroad and India - (e) providing a forum of exchanging ideas or
establishing a linkage among scientists of
leading institutions and industry working in
similar areas by inviting interested faculty/
research personnel, - (f) providing fellowships for helping faculty and
scholars from less endowed institutions,
especially from neighboring regions.
102Collaboration with CIMPA, France
- International Center for Pure and Applied
Mathematics (CIMPA), France has recently
partially supported an International Workshop on
Soft Computing Approaches to Pattern Recognition
and Image Processing organized by the Machine
Intelligence Unit of ISI. They have promised to
support similar endeavors of the center in future
by providing travel support to foreign delegates.
103Mechanism for collaborative projects
- Since research excellence is the main object of
the center, the collaborative projects would
focus mainly on research. - At least one investigator of the center or a
faculty of ISI deputed to the center would be
involved in such a project. - Merits of the project proposals, routed through
proper channel, will be evaluated. - Infrastructural facilities will be provided by
the center. - Travel expense and local hospitality (in the form
of fellowship) of the visitors will be borne by
the center. - Less endowed Institutes will be given due
preference.
104Thank You!!