Title: Soft Computing, Machine Intelligence and Data Mining
1Soft Computing, Machine Intelligence and Data
Mining
- Sankar K. Pal
- Machine Intelligence Unit
- Indian Statistical Institute, Calcutta
- http//www.isical.ac.in/sankar
2ISI, 1931, Mahalanobis
CS
BS
MS
AS
SS
PES
SQC
ACMU ECSU MIU CVPRU
3- Director
- Prof-in-charge
- Heads
- Distinguished Scientist
- Professor
- Associate Professor
- Lecturer
Faculty 250
Courses B. Stat, M. Stat, M. Tech(CS),
M.Tech(SQC OR), Ph.D.
Location Calcutta (HQ) Delhi
Bangalore Hyderabad,
Madras Giridih, Bombay
4MIU Activities
(Formed in March 1993)
- Pattern Recognition and Image Processing
- Color Image Processing
- Data Mining
- Data Condensation, Feature Selection
- Support Vector Machine
- Case Generation
- Soft Computing
- Fuzzy Logic, Neural Networks, Genetic Algorithms,
Rough Sets - Hybridization
- Case Based Reasoning
- Fractals/Wavelets
- Image Compression
- Digital Watermarking
- Wavelet ANN
- Bioinformatics
5- Externally Funded Projects
- INTEL
- CSIR
- Silicogene
- Center for Excellence in Soft Computing Research
- Foreign Collaborations
- (Japan, France, Poland, Honk Kong, Australia)
- Editorial Activities
- Journals, Special Issues
- Books
- Achievements/Recognitions
Faculty 10 Research Scholar/Associate 8
6Contents
- What is Soft Computing ?
- - Computational Theory of Perception
- Pattern Recognition and Machine Intelligence
- Relevance of Soft Computing Tools
- Different Integrations
7- Emergence of Data Mining
- Need
- KDD Process
- Relevance of Soft Computing Tools
- Rule Generation/Evaluation
- Modular Evolutionary Rough Fuzzy MLP
- Modular Network
- Rough Sets, Granules Rule Generation
- Variable Mutation Operations
- Knowledge Flow
- Example and Merits
8- Rough-fuzzy Case Generation
- Granular Computing
- Fuzzy Granulation
- Mapping Dependency Rules to Cases
- Case Retrieval
- Examples and Merits
- Conclusions
9SOFT COMPUTING (L. A. Zadeh)
- Aim
- To exploit the tolerance for imprecision
uncertainty, approximate reasoning and partial
truth to achieve tractability, robustness, low
solution cost, and close resemblance with human
like decision making - To find an approximate solution to an
imprecisely/precisely formulated problem.
10- Parking a Car
- Generally, a car can be parked rather easily
because the final position of the car is not
specified exactly. It it were specified to
within, say, a fraction of a millimeter and a few
seconds of arc, it would take hours or days of
maneuvering and precise measurements of distance
and angular position to solve the problem. - ? High precision carries a high cost.
11- ? The challenge is to exploit the tolerance for
imprecision by devising methods of computation
which lead to an acceptable solution at low cost.
This, in essence, is the guiding principle of
soft computing.
12- Soft Computing is a collection of methodologies
(working synergistically, not competitively)
which, in one form or another, reflect its
guiding principle Exploit the tolerance for
imprecision, uncertainty, approximate reasoning
and partial truth to achieve Tractability,
Robustness, and close resemblance with human
like decision making. - Foundation for the conception and design of high
MIQ (Machine IQ) systems.
13- Provides Flexible Information Processing
Capability for representation and evaluation of
various real life ambiguous and uncertain
situations. - ?
- Real World Computing
- It may be argued that it is soft computing
rather than hard computing that should be viewed
as the foundation for Artificial Intelligence.
14- At this junction, the principal constituents
of soft computing are Fuzzy Logic ,
Neurocomputing , Genetic Algorithms
and Rough Sets .
FL
GA
NC
- Within Soft Computing FL, NC, GA, RS are
Complementary rather than Competitive
15FL the algorithms for dealing with imprecision
and uncertainty NC the machinery for learning
and curve fitting GA the algorithms for search
and optimization
handling uncertainty arising from the granularity
in the domain of discourse
16Referring back to exampleParking a Car
- Do we use any measurement and computation while
performing the tasks in Soft Computing? -
- We use Computational Theory of Perceptions (CTP)
17Computational Theory of Perceptions (CTP)
AI Magazine, 22(1), 73-84, 2001
Provides capability to compute and reason with
perception based information
- Example Car parking, driving in city, cooking
meal, summarizing story - Humans have remarkable capability to perform a
wide variety of physical and mental tasks without
any measurement and computations
18- They use perceptions of time, direction, speed,
shape, possibility, likelihood, truth, and other
attributes of physical and mental objects - Reflecting the finite ability of the sensory
organs and (finally the brain) to resolve
details, Perceptions are inherently imprecise
19- Perceptions are fuzzy (F) granular
- (both fuzzy and granular)
- Boundaries of perceived classes are unsharp
- Values of attributes are granulated.
- (a clump of indistinguishable points/objects)
- Example
- Granules in age very young, young, not so old,
- Granules in direction slightly left, sharp
right,
20- F-granularity of perceptions puts them well
beyond the reach of traditional methods of
analysis (based on predicate logic and
probability theory) - Main distinguishing feature the assumption that
perceptions are described by propositions drawn
from a natural language.
21Hybrid Systems
- Neuro-fuzzy
- Genetic neural
- Fuzzy genetic
- Fuzzy neuro
- genetic
Knowledge-based Systems
- Probabilistic reasoning
- Approximate reasoning
- Case based reasoning
Data Driven Systems
Machine Intelligence
- Neural network
- system
- Evolutionary
- computing
Non-linear Dynamics
- Chaos theory
- Rescaled range
- analysis (wavelet)
- Fractal analysis
- Pattern recognition
- and learning
Machine Intelligence A core concept for
grouping various advanced technologies with
Pattern Recognition and Learning
22Pattern Recognition System (PRS)
- Measurement ? Feature ? Decision
- Space Space Space
- Uncertainties arise from deficiencies of
information available from a situation - Deficiencies may result from incomplete,
imprecise, ill-defined, not fully reliable,
vague, contradictory information in various
stages of a PRS
23Relevance of Fuzzy Sets in PR
- Representing linguistically phrased input
features for processing
- Representing multi-class membership of ambiguous
patterns -
- Generating rules inferences in
- linguistic form
-
- Extracting ill-defined image regions, primitives,
properties and describing relations among them as
fuzzy subsets
24ANNs provide Natural Classifiers having
- Resistance to Noise,
- Tolerance to Distorted Patterns /Images (Ability
to Generalize) - Superior Ability to Recognize Overlapping Pattern
Classes or Classes with Highly Nonlinear
Boundaries or Partially Occluded or Degraded
Images
- Potential for Parallel Processing
- Non parametric
25Why GAs in PR ?
- Methods developed for Pattern Recognition and
Image Processing are usually problem dependent. - Many tasks involved in analyzing/identifying a
pattern need Appropriate Parameter Selection and
Efficient Search in complex spaces to obtain
Optimal Solutions
26- Makes the processes
- - Computationally Intensive
- - Possibility of Losing the Exact Solution
- GAs Efficient, Adaptive and robust Search
Processes, Producing near optimal solutions and
have a large amount of Implicit Parallelism - GAs are Appropriate and Natural Choice for
problems which need Optimizing Computation
Requirements, and Robust, Fast and Close
Approximate Solutions
27Relevance of FL, ANN, GAs Individually
to PR Problems is Established
28In late eighties scientists thought Why NOT
Integrations ?
Fuzzy Logic ANN ANN GA Fuzzy Logic ANN
GA Fuzzy Logic ANN GA Rough Set
Neuro-fuzzy hybridization is the most
visible integration realized so far.
29Why Fusion
Fuzzy Set theoretic models try to mimic human
reasoning and the capability of handling
uncertainty (SW) Neural Network models attempt
to emulate architecture and information
representation scheme of human brain (HW)
NEURO-FUZZY Computing
(for More Intelligent System)
30FUZZY SYSTEM
ANN used for learning and Adaptation
NFS
ANN
Fuzzy Sets used to Augment its Application
domain
FNN
31MERITS
- GENERIC
-
- APPLICATION SPECIFIC
32Rough-Fuzzy Hybridization
- Fuzzy Set theory assigns to each object a degree
- of belongingness (membership) to represent an
- imprecise/vague concept.
- The focus of rough set theory is on the
ambiguity - caused by limited discernibility of objects
(lower - and upper approximation of concept).
Rough sets and Fuzzy sets can be integrated to
develop a model of uncertainty stronger than
either.
33Rough Fuzzy Hybridization A New Trend in
Decision Making, S. K. Pal and A. Skowron (eds),
Springer-Verlag, Singapore, 1999
34Neuro-Rough Hybridization
- Rough set models are used to generate network
- parameters (weights).
- Roughness is incorporated in inputs and output
of - networks for uncertainty handling, performance
- enhancement and extended domain of application.
- Networks consisting of rough neurons are used.
Neurocomputing, Spl. Issue on Rough-Neuro
Computing, S. K. Pal, W. Pedrycz, A. Skowron and
R. Swiniarsky (eds), vol. 36 (1-4), 2001.
35- Neuro-Rough-Fuzzy-Genetic Hybridization
- Rough sets are used to extract domain knowledge
in the form of linguistic rules
generates fuzzy Knowledge based networks
evolved using Genetic algorithms. - Integration offers several advantages like fast
training, compact network and performance
enhancement.
36IEEE TNN, .9, 1203-1216, 1998
Incorporate Domain Knowledge using Rough Sets
37- Before we describe
- Modular Evolutionary Rough-fuzzy MLP
- Rough-fuzzy Case Generation System
- We explain Data Mining and the significance
- of Pattern Recognition, Image Processing and
- Machine Intelligence.
38One of the applications of Information Technology
that has drawn the attention of researchers is
DATA MINING Where Pattern Recognition/Image
Processing/Machine Intelligence are directly
related.
39Why Data Mining ?
- Digital revolution has made digitized information
easy to capture and fairly inexpensive to store. - With the development of computer hardware and
software and the rapid computerization of
business, huge amount of data have been collected
and stored in centralized or distributed
databases.
- Data is heterogeneous (mixture of text, symbolic,
numeric, texture, image), huge (both in
dimension and size) and scattered. - The rate at which such data is stored is growing
at a phenomenal rate.
40- As a result, traditional ad hoc mixtures of
statistical techniques and data management tools
are no longer adequate for analyzing this vast
collection of data.
41- Pattern Recognition and Machine Learning
- principles applied to a very large (both in size
- and dimension) heterogeneous database
- ? Data Mining
- Data Mining Knowledge Interpretation
- ?
Knowledge Discovery - Process of identifying valid, novel, potentially
- useful, and ultimately understandable patterns
- in data
42Pattern Recognition, World Scientific, 2001
Data Mining (DM)
Machine Learning
Knowledge Interpretation
Mathe- matical Model of
Preprocessed
Useful
Huge Raw Data
- Knowledge
- Extraction
- Knowledge
- Evaluation
Knowledge
- Classification
- Clustering
- Rule
- Generation
Data
Data (Patterns)
- Data
- Wrapping/
- Description
Knowledge Discovery in Database (KDD)
43Data Mining Algorithm Components
- Model Function of the model (e.g.,
classification, clustering, rule generation) and
its representational form (e.g., linear
discriminants, neural networks, fuzzy logic, GAs,
rough sets). - Preference criterion Basis for preference of
one model or set of parameters over another. - Search algorithm Specification of an algorithm
for finding particular patterns of interest (or
models and parameters), given the data, family of
models, and preference criterion.
44Why Growth of Interest ?
- Falling cost of large storage devices and
increasing ease of collecting data over networks. - Availability of Robust/Efficient machine learning
algorithms to process data. - Falling cost of computational power ? enabling
use of computationally intensive methods for data
analysis.
45Example
- Financial Investment Stock indices and prices,
interest rates, credit card data, fraud detection - Health Care Various diagnostic information
stored by hospital management systems. - Data is heterogeneous (mixture of text, symbolic,
numeric, texture, image) and huge (both in
dimension and size).
46Role of Fuzzy Sets
- Modeling of imprecise/qualitative
knowledge - Transmission and handling uncertainties at
various stages - Supporting, to an extent, human type
- reasoning in natural form
47- Classification/ Clustering
- Discovering association rules (describing
interesting association relationship among
different attributes) - Inferencing
- Data summarization/condensation (abstracting the
essence from a large amount of information).
48Role of ANN
- Adaptivity, robustness, parallelism, optimality
- Machinery for learning and curve fitting (Learns
from examples) - Initially, thought to be unsuitable for black
box nature no information available in symbolic
form (suitable for human interpretation) - Recently, embedded knowledge is extracted in the
form of symbolic rules making it
suitable for Rule generation.
49Role of GAs
- Robust, parallel, adaptive search methods
suitable when the search space is large. - Used more in Prediction (P) than Description(D)
- D Finding human interpretable patterns
describing the data - P Using some variables or attributes in the
database to predict unknown/ future values of - other variables of interest.
50Example Medical Data
- Numeric and textual information may be
interspersed - Different symbols can be used with same meaning
- Redundancy often exists
- Erroneous/misspelled medical terms are common
- Data is often sparsely distributed
51- Robust preprocessing system is required to
extract any kind of knowledge from even
medium-sized medical data sets - The data must not only be cleaned of errors and
redundancy, but organized in a fashion that makes
sense for the problem
52- So, We NEED
- Efficient
- Robust
- Flexible
- Machine Learning Algorithms
- ?
- NEED for Soft Computing Paradigm
53 Without Soft Computing Machine Intelligence
Research Remains Incomplete.
54Modular Neural Networks
Task Split a learning task into several
subtasks, train a Subnetwork for each subtask,
integrate the subnetworks to generate the final
solution. Strategy Divide and Conquer
55- The approach involves
- Effective decomposition of the problems s.t. the
- Subproblems could be solved with compact
- networks.
- Effective combination and training of the
- subnetworks s.t. there is Gain in terms of both
- total training time, network size and accuracy
of - solution.
56Advantages
- Accelerated training
- The final solution network has more structured
- components
- Representation of individual clusters
(irrespective - of size/importance) is better preserved in the
final - solution network.
- The catastrophic interference problem of neural
- network learning (in case of overlapped
regions) - is reduced.
57Classification Problem
- Split a k-class problem into k 2-class problems.
- Train one (or multiple) subnetwork modules for
- each 2-class problem.
- Concatenate the subnetworks s.t. Intra-module
links - that have already evolved are unchanged, while
- Inter-module links are initialized to a low
value. - Train the concatenated networks s.t. the Intra-
- module links (already evolved) are less
perturbed, - while the Inter-module links are more
perturbed.
583-class problem 3 (2-class problem)
Class 1 Subnetwork
Class 2 Subnetwork
Class 3 Subnetwork
Integrate Subnetwork Modules
Links to be grown
Links with values preserved
Final Training Phase
I
Final Network
Inter-module links grown
59Modular Rough Fuzzy MLP?
A modular network designed using four different
Soft Computing tools. Basic Network Model
Fuzzy MLP Rough Set theory is used to generate
Crude decision rules Representing each of the
classes from the Discernibility Matrix. (There
may be multiple rules for each class
multiple subnetworks for each class)
60The Knowledge based subnetworks are concatenated
to form a population of initial solution
networks. The final solution network is evolved
using a GA with variable mutation operator. The
bits corresponding to the Intra-module links
(already evolved) have low mutation probability,
while Inter-module links have high mutation
probability.
61Rough Sets
Z. Pawlak 1982, Int. J. Comp. Inf. Sci.
- Offer mathematical tools to discover hidden
patterns in data. - Fundamental principle of a rough set-based
learning system is to discover redundancies and
dependencies between the given features of a data
to be classified.
62- Approximate a given concept both from below and
from above, using lower and upper approximations.
- Rough set learning algorithms can be used to
obtain rules in IF-THEN form from a decision
table. - Extract Knowledge from data base (decision table
w.r.t. objects and attributes ? remove
undesirable attributes (knowledge discovery) ?
analyze data dependency ? minimum subset of
attributes (reducts))
63Rough Sets
Upper Approximation BX
Set X
Lower Approximation BX
xB (Granules)
.
x
xB set of all points belonging to the same
granule as of the point x
in feature space WB.
xB is the set of all points which are
indiscernible with point x in terms of feature
subset B.
64Approximations of the set
w.r.t feature subset B
B-lower BX
Granules definitely belonging to X
B-upper BX
Granules definitely and possibly belonging to X
If BX BX, X is B-exact or B-definable Otherwise
it is Roughly definable
65Rough Sets
Uncertainty Handling
Granular Computing
(Using information granules)
(Using lower upper approximations)
66Granular Computing Computation is performed
using information granules and not the data
points (objects)
Information compression Computational gain
67Information Granules and Rough Set Theoretic Rules
F2
high
medium
low
low
medium
high
F1
Rule
- Rule provides crude description of the class
using - granule
68Rough Set Rule Generation
Decision Table
Object F1 F2 F3 F4 F5
Decision x1 1 0 1 0
1 Class 1 x2 0 0 0 0
1 Class 1 x3 1 1 1
1 1 Class 1 x4 0 1 0
1 0 Class 2 x5 1 1
1 0 0 Class 2
Discernibility Matrix (c) for Class 1
Objects x1 x2 x3
x1 f F1, F3 F2, F4
x2 f F1,F2,F3,F4
x3 f
69Discernibility function
Discernibility function considering the object x1
belonging to Class 1 Discernibility of x1 w.r.t
x2 (and) Discernibility of x1 w.r.t x3
Similarly, Discernibility function considering
object
Dependency Rules (AND-OR form)
70Rules
No. of Classes2 No. of Features2
Crude Networks
L1 . . . H2
L1 . . . H2
L1 . . . H2
...
...
...
Population1
Population2
Population3
GA2 (Phase 1)
GA3 (Phase 1)
GA1 (Phase 1)
Partially Trained
L1 . . . H2
L1 . . . H2
L1 . . . H2
71Concatenated
Links having small random value
L1 . . . H2
L1 . . . H2
. . . . . . . . . . . .
Final Population
Low mutation probability
GA (Phase II) (with restricted mutation
probability )
High mutation probability
Final Trained Network
72Knowledge Flow in Modular Rough Fuzzy MLP
IEEE Trans. Knowledge Data Engg., 15(1), 14-25,
2003
Feature Space
Rough Set Rules
C1
(R1)
Network Mapping
C1
F2
C2(R2)
C2(R3)
F1
R1 (Subnet 1)
R2 (Subnet 2)
R3 (Subnet 3)
Partial Training with Ordinary GA
Feature Space
SN1
(SN2)
(SN1)
(SN3)
F2
SN2
Partially Refined Subnetworks
SN3
F1
73Concatenation of Subnetworks
high mutation prob.
low mutation prob.
Evolution of the Population of Concatenated
networks with GA having variable mutation operator
Feature Space
C1
F2
C2
Final Solution Network
F1
74(No Transcript)
75Speech Data 3 Features, 6 Classes
Classification Accuracy
76Network Size (No. of Links)
77 Training Time (hrs) DEC Alpha
Workstation _at_400MHz
781. MLP 4.
Rough Fuzzy MLP 2. Fuzzy MLP
5. Modular Rough Fuzzy MLP 3. Modular
Fuzzy MLP
79Network Structure IEEE Trans.
Knowledge Data Engg., 15(1), 14-25, 2003
Modular Rough Fuzzy MLP
Structured ( of links few)
Fuzzy MLP
Unstructured ( of links more)
Histogram of weight values
80Connectivity of the network obtained using
Modular Rough Fuzzy MLP
81Sample rules extracted for Modular Rough Fuzzy
MLP
82Rule Evaluation
- Accuracy
- Fidelity (Number of times network and rule base
output agree) - Confusion (should be restricted within minimum
no. of classes)
- Coverage (a rule base with smaller uncovered
region i.e., test set for which no rules are
fired, is better) - Rule base size (smaller the no. of rules, more
compact is the rule base) - Certainty (confidence of rules)
83Existing Rule Extraction Algorithms
- Subset Searches over all possible combination
of input - weights to a node of trained networks. Rules
are generated - from these Subsets of links, for which sum of
the weights - exceed the bias for that node.
- MofN Instead of AND-OR rules, the method
extracts rules - of the Form IF M out of N inputs are high THEN
Class I. - X2R Unlike previous two methods which consider
links - of a network, X2R generates rule from
input-output - mapping implemented by the network.
- C4.5 Rule generation algorithm based on
decision trees.
84IEEE Trans. Knowledge Data Engg., 15(1), 14-25,
2003
Comparison of Rules obtained for Speech data
85Number of Rules
Confusion
CPU Time
86Case Based Reasoning (CBR)
- Cases some typical situations, already
experienced by the system. - conceptualized piece of knowledge
representing an experience that teaches a lesson
for achieving the goals of the system.
- CBR involves
- adapting old solutions to meet new demands
- using old cases to explain new situations or to
- justify new solutions
- reasoning from precedents to interpret new
situations.
87- ? learns and becomes more efficient as a
byproduct of its reasoning activity.
- Example Medical diagnosis and Law
interpretation where the knowledge available is
incomplete and/or evidence is sparse.
88- Unlike traditional knowledge-based system, case
based system operates through a process of - remembering one or a small set of concrete
instances or cases and - basing decisions on comparisons between the
new situation and the old ones.
89- Case Selection ? Cases belong to the set of
examples encountered. - Case Generation ? Constructed Cases need not be
any of the examples.
90Rough Sets
Uncertainty Handling
Granular Computing
(Using information granules)
(Using lower upper approximations)
91IEEE Trans. Knowledge Data Engg., to appear
Granular Computing and Case Generation
- Information Granules A group of similar objects
clubbed together by an indiscernibility relation. - Granular Computing Computation is performed
using information granules and not the data
points (objects) - Information compression
- Computational gain
92- Cases Informative patterns (prototypes)
characterizing the problems. - In rough set theoretic framework
- Cases ? Information Granules
- In rough-fuzzy framework
- Cases ? Fuzzy Information Granules
93Characteristics and Merits
- Cases are cluster granules, not sample points
- Involves only reduced number of relevant
features with variable size - Less storage requirements
- Fast retrieval
- Suitable for mining data with large dimension
and size
94- How to Achieve?
- Fuzzy sets help in linguistic representation of
patterns, providing a fuzzy granulation of the
feature space - Rough sets help in generating dependency rules to
model informative/representative regions in the
granulated feature space. - Fuzzy membership functions corresponding to the
representative regions are stored as Cases.
95Fuzzy (F)-Granulation
mlow
mmedium
mhigh
1
Membership value
0.5
cM
cH
cL
Feature j
lL
lM
p-function
96Clow(Fj) mjl Cmedium(Fj) mj Chigh(Fj)
mjh ?low(Fj) Cmedium(Fj) ? Clow(Fj) ?high(Fj)
Chigh(Fj) ? Cmedium(Fj) ?medium(Fj) 0.5
(Chigh(Fj) ? Clow(Fj)) Mj mean of the pattern
points along jth axis. Mjl mean of points in
the range Fj min, mj) Mjh mean of points in
the range (mj, Fj max Fj max, Fj min maximum
and minimum values of feature Fj.
97- An n-dimensional pattern Fi Fi1, Fi2, , Fin
is represented as a 3n-dimensional fuzzy
linguistic pattern Pal Mitra 1992 - Fi ?low(Fi1) (Fi), , ?high(Fin) (Fi)
- Set m value at 1 or 0, if it is higher or lower
- than 0.5
- ? Binary 3n-dimensional patterns are obtained
98- (Compute the frequency nki of occurrence of
binary patterns. Select those patterns having
frequency above a threshold Tr (for noise
removal)) - Generate a decision table consisting of the
binary patterns. - Extract dependency rules corresponding to
- informative regions (blocks).
(e.g., class L1 ? M2)
99Rough Set Rule Generation
Decision Table
Object F1 F2 F3 F4 F5
Decision x1 1 0 1 0
1 Class 1 x2 0 0 0 0
1 Class 1 x3 1 1 1
1 1 Class 1 x4 0 1 0
1 0 Class 2 x5 1 1
1 0 0 Class 2
100Discernibility Matrix (c) for Class 1
Objects x1 x2 x3
x1 f F1, F3 F2, F4
x2 f F1,F2,F3,F4
x3 f
101Discernibility function considering the object x1
belonging to Class 1 Discernibility of x1 w.r.t
x2 (and) Discernibility of x1 w.r.t x3
Similarly, Discernibility function considering
object
Dependency Rules (AND-OR form)
102Mapping Dependency Rules to Cases
- Each conjunction e.g., L1 ? M2 represents a
region (block) - For each conjunction, store as a case
- Parameters of the fuzzy membership functions
corresponding to linguistic variables that occur
in the conjunction. - (thus, multiple cases may be generated from a
rule.)
103- Note All features may not occur in a rule.
- Cases may be represented by Different Reduced
number of features. - Structure of a Case
- Parameters of the membership functions (center,
radii), Class information
104Example IEEE
Trans. Knowledge Data Engg., to appear
F2
CASE 1
0.9
0.4
X X X X X X X X X
CASE 2
0.2
0.7
0.1
0.5
F1
Parameters of fuzzy linguistic sets low, medium,
high
105Dependency Rules and Cases Obtained
Case 1 Feature No 1, fuzzset (L) c 0.1, ?
0.5 Feature No 2, fuzzset (H) c 0.9, ?
0.5 Class1 Case 2 Feature No 1, fuzzset (H)
c 0.7, ? 0.4 Feature No 2, fuzzset (L) c
0.2, ? 0.5 Class2
106Case Retrieval
- Similarity (sim(x,c)) between a pattern x and a
case c is defined as - n number of features present in case c
107- the degree of belongingness of pattern x to
fuzzy linguistic set fuzzset for feature j. - For classifying an unknown pattern, the case
closest to the pattern in terms of sim(x,c) is
retrieved and its class is assigned to the
pattern.
108Experimental Results and Comparisons
- Forest Covertype Contains 10 dimensions, 7
classes and 586,012 samples. It is a Geographical
Information System data representing forest cover
types (pine/fir etc) of USA. The variables are
cartographic and remote sensing measurements. All
the variables are numeric.
109- Multiple features This dataset consists of
features of handwritten numerals (0-9)
extracted from a collection of Dutch utility
maps. There are total 2000 patterns, 649 features
(all numeric) and all classes. - Iris The dataset contains 150 instances, 4
features and 3 classes of Iris flowers. The
features are numeric.
110- Some Existing Case Selection Methods
- k-NN based
- Condensed nearest neighbor (CNN),
- Instance based learning (e.g., IB3),
- Instance based learning with feature weighting
(e.g., IB4). - Fuzzy logic based
- Neuro-fuzzy based.
111Algorithms Compared
- Instance based learning algorithm, IB3 Aha
1991, - Instance based learning algorithm, IB4 Aha 1992
(reduced feature). The feature weighting is
learned by random hill climbing in IB4. A
specified number of features having high weights
is selected. - Random case selection.
112Evaluation in terms of
- 1-NN classification accuracy using the cases.
Training set 10 for case generation, and Test
set 90 - Number of cases stored in the case base.
- Average number of features required to store a
case (navg).
- CPU time required for case generation (tgen).
- Average CPU time required to retrieve a case
(tret). (on a Sun UltraSparc _at_350 MHz Workstation)
113Iris Flowers 4 features, 3 classes, 150 samples
Number of cases 3 (for all methods)
114Forest Cover Types 10 features, 7 classes,
5,86,012 samples
Number of cases 545 (for all methods)
115Hand Written Numerals 649 features, 10 classes,
2000 samples
Number of cases 50 (for all methods)
116For same number of cases
Accuracy Proposed method much superior to random
selection and IB4, close IB3. Average Number of
Features Stored Proposed method stores much less
than the original data dimension. Case Generation
Time Proposed method requires much less compared
to IB3 and IB4. Case Retrieval Time Several
orders less for proposed method compared to IB3
and random selection. Also less than IB4.
117- Conclusions
- Relation between Soft Computing, Machine
Intelligence and Pattern Recognition is
explained. - Emergence of Data Mining and Knowledge
Discovery from PR point of view is explained. - Significance of Hybridization in Soft Computing
paradigm is illustrated. -
118- Modular concept enhances performance,
accelerates training and makes the network
structured with less no. of links. - Rules generated are superior to other related
methods in terms of accuracy, coverage, fidelity,
confusion, size and certainty.
119- Rough sets used for generating information
granules. - Fuzzy sets provide efficient granulation of
feature space (F -granulation). - Reduced and variable feature subset
representation of cases is a unique feature of
the scheme. - Rough-fuzzy case generation method is suitable
for CBR systems involving datasets large both in
dimension and size.
120- Unsupervised case generation, Rough-SOM
- (Applied intelligence, to appear)
- Application to multi-spectral image segmentation
- (IEEE Trans. Geoscience and Remote Sensing,
40(11), 2495-2501, 2002) - Significance in Computational Theory of
Perception (CTP)
121Thank You!!