Soft Computing, Machine Intelligence and Data Mining

About This Presentation

Title:

Soft Computing, Machine Intelligence and Data Mining

Description:

Soft Computing, Machine Intelligence and Data Mining Sankar K. Pal Machine Intelligence Unit Indian Statistical Institute, Calcutta http://www.isical.ac.in/~sankar – PowerPoint PPT presentation

Number of Views:505

Avg rating:3.0/5.0

Slides: 83

Provided by: PC359

Category:

more less

Transcript and Presenter's Notes

Title: Soft Computing, Machine Intelligence and Data Mining

1
Soft Computing, Machine Intelligence and Data
Mining

Sankar K. Pal
Machine Intelligence Unit
Indian Statistical Institute, Calcutta
http//www.isical.ac.in/sankar

2
ISI, 1931, Mahalanobis
CS
BS
MS
AS
SS
PES
SQC
ACMU ECSU MIU CVPRU
3

Director
Prof-in-charge
Heads

Distinguished Scientist
Professor
Associate Professor
Lecturer

Faculty 250
Courses B. Stat, M. Stat, M. Tech(CS),
M.Tech(SQC OR), Ph.D.
Location Calcutta (HQ) Delhi
Bangalore Hyderabad,
Madras Giridih, Bombay
4
MIU Activities
(Formed in March 1993)

Pattern Recognition and Image Processing
Color Image Processing
Data Mining
Data Condensation, Feature Selection
Support Vector Machine
Case Generation
Soft Computing
Fuzzy Logic, Neural Networks, Genetic Algorithms,
Rough Sets
Hybridization
Case Based Reasoning
Fractals/Wavelets
Image Compression
Digital Watermarking
Wavelet ANN
Bioinformatics

Externally Funded Projects
INTEL
CSIR
Silicogene
Center for Excellence in Soft Computing Research
Foreign Collaborations
(Japan, France, Poland, Honk Kong, Australia)
Editorial Activities
Journals, Special Issues
Books
Achievements/Recognitions

Faculty 10 Research Scholar/Associate 8
6
Contents

What is Soft Computing ?
- Computational Theory of Perception
Pattern Recognition and Machine Intelligence
Relevance of Soft Computing Tools
Different Integrations

Emergence of Data Mining
Need
KDD Process
Relevance of Soft Computing Tools
Rule Generation/Evaluation

Modular Evolutionary Rough Fuzzy MLP
Modular Network
Rough Sets, Granules Rule Generation
Variable Mutation Operations
Knowledge Flow
Example and Merits

Rough-fuzzy Case Generation
Granular Computing
Fuzzy Granulation
Mapping Dependency Rules to Cases
Case Retrieval
Examples and Merits
Conclusions

9
SOFT COMPUTING (L. A. Zadeh)

Aim
To exploit the tolerance for imprecision
uncertainty, approximate reasoning and partial
truth to achieve tractability, robustness, low
solution cost, and close resemblance with human
like decision making
To find an approximate solution to an
imprecisely/precisely formulated problem.

Parking a Car
Generally, a car can be parked rather easily
because the final position of the car is not
specified exactly. It it were specified to
within, say, a fraction of a millimeter and a few
seconds of arc, it would take hours or days of
maneuvering and precise measurements of distance
and angular position to solve the problem.
? High precision carries a high cost.

? The challenge is to exploit the tolerance for
imprecision by devising methods of computation
which lead to an acceptable solution at low cost.
This, in essence, is the guiding principle of
soft computing.

Soft Computing is a collection of methodologies
(working synergistically, not competitively)
which, in one form or another, reflect its
guiding principle Exploit the tolerance for
imprecision, uncertainty, approximate reasoning
and partial truth to achieve Tractability,
Robustness, and close resemblance with human
like decision making.
Foundation for the conception and design of high
MIQ (Machine IQ) systems.

Provides Flexible Information Processing
Capability for representation and evaluation of
various real life ambiguous and uncertain
situations.
?
Real World Computing
It may be argued that it is soft computing
rather than hard computing that should be viewed
as the foundation for Artificial Intelligence.

At this junction, the principal constituents
of soft computing are Fuzzy Logic ,
Neurocomputing , Genetic Algorithms
and Rough Sets .

FL
GA
NC

Within Soft Computing FL, NC, GA, RS are
Complementary rather than Competitive

Role of

FL the algorithms for dealing with imprecision
and uncertainty NC the machinery for learning
and curve fitting GA the algorithms for search
and optimization
handling uncertainty arising from the granularity
in the domain of discourse
16
Referring back to exampleParking a Car

Do we use any measurement and computation while
performing the tasks in Soft Computing?
We use Computational Theory of Perceptions (CTP)

17
Computational Theory of Perceptions (CTP)
AI Magazine, 22(1), 73-84, 2001
Provides capability to compute and reason with
perception based information

Example Car parking, driving in city, cooking
meal, summarizing story
Humans have remarkable capability to perform a
wide variety of physical and mental tasks without
any measurement and computations

They use perceptions of time, direction, speed,
shape, possibility, likelihood, truth, and other
attributes of physical and mental objects
Reflecting the finite ability of the sensory
organs and (finally the brain) to resolve
details, Perceptions are inherently imprecise

Perceptions are fuzzy (F) granular
(both fuzzy and granular)
Boundaries of perceived classes are unsharp
Values of attributes are granulated.
(a clump of indistinguishable points/objects)
Example
Granules in age very young, young, not so old,
Granules in direction slightly left, sharp
right,

F-granularity of perceptions puts them well
beyond the reach of traditional methods of
analysis (based on predicate logic and
probability theory)
Main distinguishing feature the assumption that
perceptions are described by propositions drawn
from a natural language.

21
Hybrid Systems

Neuro-fuzzy
Genetic neural
Fuzzy genetic
Fuzzy neuro
genetic

Knowledge-based Systems

Probabilistic reasoning
Approximate reasoning
Case based reasoning

Data Driven Systems
Machine Intelligence

Neural network
system
Evolutionary
computing

Fuzzy logic
Rough sets

Non-linear Dynamics

Chaos theory
Rescaled range
analysis (wavelet)
Fractal analysis

Pattern recognition
and learning

Machine Intelligence A core concept for
grouping various advanced technologies with
Pattern Recognition and Learning
22
Pattern Recognition System (PRS)

Measurement ? Feature ? Decision
Space Space Space
Uncertainties arise from deficiencies of
information available from a situation
Deficiencies may result from incomplete,
imprecise, ill-defined, not fully reliable,
vague, contradictory information in various
stages of a PRS

23
Relevance of Fuzzy Sets in PR

Representing linguistically phrased input
features for processing

Representing multi-class membership of ambiguous
patterns

Generating rules inferences in
linguistic form

Extracting ill-defined image regions, primitives,
properties and describing relations among them as
fuzzy subsets

24
ANNs provide Natural Classifiers having

Resistance to Noise,
Tolerance to Distorted Patterns /Images (Ability
to Generalize)
Superior Ability to Recognize Overlapping Pattern
Classes or Classes with Highly Nonlinear
Boundaries or Partially Occluded or Degraded
Images

Potential for Parallel Processing
Non parametric

25
Why GAs in PR ?

Methods developed for Pattern Recognition and
Image Processing are usually problem dependent.
Many tasks involved in analyzing/identifying a
pattern need Appropriate Parameter Selection and
Efficient Search in complex spaces to obtain
Optimal Solutions

Makes the processes
- Computationally Intensive
- Possibility of Losing the Exact Solution
GAs Efficient, Adaptive and robust Search
Processes, Producing near optimal solutions and
have a large amount of Implicit Parallelism
GAs are Appropriate and Natural Choice for
problems which need Optimizing Computation
Requirements, and Robust, Fast and Close
Approximate Solutions

27
Relevance of FL, ANN, GAs Individually
to PR Problems is Established
28
In late eighties scientists thought Why NOT
Integrations ?
Fuzzy Logic ANN ANN GA Fuzzy Logic ANN
GA Fuzzy Logic ANN GA Rough Set
Neuro-fuzzy hybridization is the most
visible integration realized so far.
29
Why Fusion
Fuzzy Set theoretic models try to mimic human
reasoning and the capability of handling
uncertainty (SW) Neural Network models attempt
to emulate architecture and information
representation scheme of human brain (HW)
NEURO-FUZZY Computing
(for More Intelligent System)
30
FUZZY SYSTEM
ANN used for learning and Adaptation
NFS
ANN
Fuzzy Sets used to Augment its Application
domain
FNN
31
MERITS

GENERIC
APPLICATION SPECIFIC

32
Rough-Fuzzy Hybridization

Fuzzy Set theory assigns to each object a degree
of belongingness (membership) to represent an
imprecise/vague concept.
The focus of rough set theory is on the
ambiguity
caused by limited discernibility of objects
(lower
and upper approximation of concept).

Rough sets and Fuzzy sets can be integrated to
develop a model of uncertainty stronger than
either.
33
Rough Fuzzy Hybridization A New Trend in
Decision Making, S. K. Pal and A. Skowron (eds),
Springer-Verlag, Singapore, 1999
34
Neuro-Rough Hybridization

Rough set models are used to generate network
parameters (weights).
Roughness is incorporated in inputs and output
of
networks for uncertainty handling, performance
enhancement and extended domain of application.
Networks consisting of rough neurons are used.

Neurocomputing, Spl. Issue on Rough-Neuro
Computing, S. K. Pal, W. Pedrycz, A. Skowron and
R. Swiniarsky (eds), vol. 36 (1-4), 2001.
35

Neuro-Rough-Fuzzy-Genetic Hybridization
Rough sets are used to extract domain knowledge
in the form of linguistic rules
generates fuzzy Knowledge based networks
evolved using Genetic algorithms.
Integration offers several advantages like fast
training, compact network and performance
enhancement.

36
IEEE TNN, .9, 1203-1216, 1998
Incorporate Domain Knowledge using Rough Sets
37

Before we describe
Modular Evolutionary Rough-fuzzy MLP
Rough-fuzzy Case Generation System
We explain Data Mining and the significance
of Pattern Recognition, Image Processing and
Machine Intelligence.

38
One of the applications of Information Technology
that has drawn the attention of researchers is
DATA MINING Where Pattern Recognition/Image
Processing/Machine Intelligence are directly
related.
39
Why Data Mining ?

Digital revolution has made digitized information
easy to capture and fairly inexpensive to store.
With the development of computer hardware and
software and the rapid computerization of
business, huge amount of data have been collected
and stored in centralized or distributed
databases.

Data is heterogeneous (mixture of text, symbolic,
numeric, texture, image), huge (both in
dimension and size) and scattered.
The rate at which such data is stored is growing
at a phenomenal rate.

As a result, traditional ad hoc mixtures of
statistical techniques and data management tools
are no longer adequate for analyzing this vast
collection of data.

Pattern Recognition and Machine Learning
principles applied to a very large (both in size
and dimension) heterogeneous database
? Data Mining
Data Mining Knowledge Interpretation
?
Knowledge Discovery
Process of identifying valid, novel, potentially
useful, and ultimately understandable patterns
in data

42
Pattern Recognition, World Scientific, 2001
Data Mining (DM)

Data
Cleaning

Machine Learning
Knowledge Interpretation

Data
Condensation

Mathe- matical Model of
Preprocessed
Useful
Huge Raw Data

Knowledge
Extraction
Knowledge
Evaluation

Dimensionality
Reduction

Knowledge

Classification
Clustering
Rule
Generation

Data
Data (Patterns)

Data
Wrapping/
Description

Knowledge Discovery in Database (KDD)
43
Data Mining Algorithm Components

Model Function of the model (e.g.,
classification, clustering, rule generation) and
its representational form (e.g., linear
discriminants, neural networks, fuzzy logic, GAs,
rough sets).
Preference criterion Basis for preference of
one model or set of parameters over another.
Search algorithm Specification of an algorithm
for finding particular patterns of interest (or
models and parameters), given the data, family of
models, and preference criterion.

44
Why Growth of Interest ?

Falling cost of large storage devices and
increasing ease of collecting data over networks.
Availability of Robust/Efficient machine learning
algorithms to process data.
Falling cost of computational power ? enabling
use of computationally intensive methods for data
analysis.

45
Example

Financial Investment Stock indices and prices,
interest rates, credit card data, fraud detection
Health Care Various diagnostic information
stored by hospital management systems.
Data is heterogeneous (mixture of text, symbolic,
numeric, texture, image) and huge (both in
dimension and size).

46
Role of Fuzzy Sets

Modeling of imprecise/qualitative
knowledge
Transmission and handling uncertainties at
various stages
Supporting, to an extent, human type
reasoning in natural form

Classification/ Clustering
Discovering association rules (describing
interesting association relationship among
different attributes)
Inferencing
Data summarization/condensation (abstracting the
essence from a large amount of information).

48
Role of ANN

Adaptivity, robustness, parallelism, optimality
Machinery for learning and curve fitting (Learns
from examples)
Initially, thought to be unsuitable for black
box nature no information available in symbolic
form (suitable for human interpretation)
Recently, embedded knowledge is extracted in the
form of symbolic rules making it
suitable for Rule generation.

49
Role of GAs

Robust, parallel, adaptive search methods
suitable when the search space is large.
Used more in Prediction (P) than Description(D)
D Finding human interpretable patterns
describing the data
P Using some variables or attributes in the
database to predict unknown/ future values of
other variables of interest.

50
Example Medical Data

Numeric and textual information may be
interspersed
Different symbols can be used with same meaning
Redundancy often exists
Erroneous/misspelled medical terms are common
Data is often sparsely distributed

Robust preprocessing system is required to
extract any kind of knowledge from even
medium-sized medical data sets
The data must not only be cleaned of errors and
redundancy, but organized in a fashion that makes
sense for the problem

So, We NEED
Efficient
Robust
Flexible
Machine Learning Algorithms
?
NEED for Soft Computing Paradigm

53
Without Soft Computing Machine Intelligence
Research Remains Incomplete.
54
Modular Neural Networks
Task Split a learning task into several
subtasks, train a Subnetwork for each subtask,
integrate the subnetworks to generate the final
solution. Strategy Divide and Conquer
55

The approach involves
Effective decomposition of the problems s.t. the
Subproblems could be solved with compact
networks.
Effective combination and training of the
subnetworks s.t. there is Gain in terms of both
total training time, network size and accuracy
of
solution.

56
Advantages

Accelerated training
The final solution network has more structured
components
Representation of individual clusters
(irrespective
of size/importance) is better preserved in the
final
solution network.
The catastrophic interference problem of neural
network learning (in case of overlapped
regions)
is reduced.

57
Classification Problem

Split a k-class problem into k 2-class problems.
Train one (or multiple) subnetwork modules for
each 2-class problem.
Concatenate the subnetworks s.t. Intra-module
links
that have already evolved are unchanged, while
Inter-module links are initialized to a low
value.
Train the concatenated networks s.t. the Intra-
module links (already evolved) are less
perturbed,
while the Inter-module links are more
perturbed.

58
3-class problem 3 (2-class problem)
Class 1 Subnetwork
Class 2 Subnetwork
Class 3 Subnetwork
Integrate Subnetwork Modules
Links to be grown
Links with values preserved
Final Training Phase
I
Final Network
Inter-module links grown
59
Modular Rough Fuzzy MLP?
A modular network designed using four different
Soft Computing tools. Basic Network Model
Fuzzy MLP Rough Set theory is used to generate
Crude decision rules Representing each of the
classes from the Discernibility Matrix. (There
may be multiple rules for each class
multiple subnetworks for each class)
60
The Knowledge based subnetworks are concatenated
to form a population of initial solution
networks. The final solution network is evolved
using a GA with variable mutation operator. The
bits corresponding to the Intra-module links
(already evolved) have low mutation probability,
while Inter-module links have high mutation
probability.
61
Rough Sets
Z. Pawlak 1982, Int. J. Comp. Inf. Sci.

Offer mathematical tools to discover hidden
patterns in data.
Fundamental principle of a rough set-based
learning system is to discover redundancies and
dependencies between the given features of a data
to be classified.

Approximate a given concept both from below and
from above, using lower and upper approximations.
Rough set learning algorithms can be used to
obtain rules in IF-THEN form from a decision
table.
Extract Knowledge from data base (decision table
w.r.t. objects and attributes ? remove
undesirable attributes (knowledge discovery) ?
analyze data dependency ? minimum subset of
attributes (reducts))

63
Rough Sets
Upper Approximation BX
Set X
Lower Approximation BX
xB (Granules)
.
x
xB set of all points belonging to the same
granule as of the point x
in feature space WB.
xB is the set of all points which are
indiscernible with point x in terms of feature
subset B.
64
Approximations of the set
w.r.t feature subset B
B-lower BX
Granules definitely belonging to X
B-upper BX
Granules definitely and possibly belonging to X
If BX BX, X is B-exact or B-definable Otherwise
it is Roughly definable
65
Rough Sets
Uncertainty Handling
Granular Computing
(Using information granules)
(Using lower upper approximations)
66
Granular Computing Computation is performed
using information granules and not the data
points (objects)
Information compression Computational gain
67
Information Granules and Rough Set Theoretic Rules
F2
high
medium
low
low
medium
high
F1
Rule

Rule provides crude description of the class
using
granule

68
Rough Set Rule Generation
Decision Table
Object F1 F2 F3 F4 F5
Decision x1 1 0 1 0
1 Class 1 x2 0 0 0 0
1 Class 1 x3 1 1 1
1 1 Class 1 x4 0 1 0
1 0 Class 2 x5 1 1
1 0 0 Class 2
Discernibility Matrix (c) for Class 1
Objects x1 x2 x3
x1 f F1, F3 F2, F4
x2 f F1,F2,F3,F4
x3 f
69
Discernibility function
Discernibility function considering the object x1
belonging to Class 1 Discernibility of x1 w.r.t
x2 (and) Discernibility of x1 w.r.t x3
Similarly, Discernibility function considering
object
Dependency Rules (AND-OR form)
70
Rules
No. of Classes2 No. of Features2
Crude Networks
L1 . . . H2
L1 . . . H2
L1 . . . H2
...
...
...
Population1
Population2
Population3
GA2 (Phase 1)
GA3 (Phase 1)
GA1 (Phase 1)
Partially Trained
L1 . . . H2
L1 . . . H2
L1 . . . H2
71
Concatenated
Links having small random value
L1 . . . H2
L1 . . . H2
. . . . . . . . . . . .
Final Population
Low mutation probability
GA (Phase II) (with restricted mutation
probability )
High mutation probability
Final Trained Network
72
Knowledge Flow in Modular Rough Fuzzy MLP
IEEE Trans. Knowledge Data Engg., 15(1), 14-25,
2003
Feature Space
Rough Set Rules
C1
(R1)
Network Mapping
C1
F2
C2(R2)
C2(R3)
F1
R1 (Subnet 1)
R2 (Subnet 2)
R3 (Subnet 3)
Partial Training with Ordinary GA
Feature Space
SN1
(SN2)
(SN1)
(SN3)
F2
SN2
Partially Refined Subnetworks
SN3
F1
73
Concatenation of Subnetworks
high mutation prob.
low mutation prob.
Evolution of the Population of Concatenated
networks with GA having variable mutation operator
Feature Space
C1
F2
C2
Final Solution Network
F1
74
(No Transcript)
75
Speech Data 3 Features, 6 Classes
Classification Accuracy
76
Network Size (No. of Links)
77
Training Time (hrs) DEC Alpha
Workstation _at_400MHz
78
1. MLP 4.
Rough Fuzzy MLP 2. Fuzzy MLP
5. Modular Rough Fuzzy MLP 3. Modular
Fuzzy MLP
79
Network Structure IEEE Trans.
Knowledge Data Engg., 15(1), 14-25, 2003
Modular Rough Fuzzy MLP
Structured ( of links few)
Fuzzy MLP
Unstructured ( of links more)
Histogram of weight values
80
Connectivity of the network obtained using
Modular Rough Fuzzy MLP
81
Sample rules extracted for Modular Rough Fuzzy
MLP
82
Rule Evaluation

Accuracy
Fidelity (Number of times network and rule base
output agree)
Confusion (should be restricted within minimum
no. of classes)

Coverage (a rule base with smaller uncovered
region i.e., test set for which no rules are
fired, is better)
Rule base size (smaller the no. of rules, more
compact is the rule base)
Certainty (confidence of rules)

83
Existing Rule Extraction Algorithms

Subset Searches over all possible combination
of input
weights to a node of trained networks. Rules
are generated
from these Subsets of links, for which sum of
the weights
exceed the bias for that node.
MofN Instead of AND-OR rules, the method
extracts rules
of the Form IF M out of N inputs are high THEN
Class I.
X2R Unlike previous two methods which consider
links
of a network, X2R generates rule from
input-output
mapping implemented by the network.
C4.5 Rule generation algorithm based on
decision trees.

84
IEEE Trans. Knowledge Data Engg., 15(1), 14-25,
2003
Comparison of Rules obtained for Speech data
85
Number of Rules
Confusion
CPU Time
86
Case Based Reasoning (CBR)

Cases some typical situations, already
experienced by the system.
conceptualized piece of knowledge
representing an experience that teaches a lesson
for achieving the goals of the system.

CBR involves
adapting old solutions to meet new demands
using old cases to explain new situations or to
justify new solutions
reasoning from precedents to interpret new
situations.

? learns and becomes more efficient as a
byproduct of its reasoning activity.

Example Medical diagnosis and Law
interpretation where the knowledge available is
incomplete and/or evidence is sparse.

Unlike traditional knowledge-based system, case
based system operates through a process of
remembering one or a small set of concrete
instances or cases and
basing decisions on comparisons between the
new situation and the old ones.

Case Selection ? Cases belong to the set of
examples encountered.
Case Generation ? Constructed Cases need not be
any of the examples.

90
Rough Sets
Uncertainty Handling
Granular Computing
(Using information granules)
(Using lower upper approximations)
91
IEEE Trans. Knowledge Data Engg., to appear
Granular Computing and Case Generation

Information Granules A group of similar objects
clubbed together by an indiscernibility relation.
Granular Computing Computation is performed
using information granules and not the data
points (objects)
Information compression
Computational gain

Cases Informative patterns (prototypes)
characterizing the problems.
In rough set theoretic framework
Cases ? Information Granules
In rough-fuzzy framework
Cases ? Fuzzy Information Granules

93
Characteristics and Merits

Cases are cluster granules, not sample points
Involves only reduced number of relevant
features with variable size
Less storage requirements
Fast retrieval
Suitable for mining data with large dimension
and size

How to Achieve?
Fuzzy sets help in linguistic representation of
patterns, providing a fuzzy granulation of the
feature space
Rough sets help in generating dependency rules to
model informative/representative regions in the
granulated feature space.
Fuzzy membership functions corresponding to the
representative regions are stored as Cases.

95
Fuzzy (F)-Granulation
mlow
mmedium
mhigh
1
Membership value
0.5
cM
cH
cL
Feature j
lL
lM
p-function
96
Clow(Fj) mjl Cmedium(Fj) mj Chigh(Fj)
mjh ?low(Fj) Cmedium(Fj) ? Clow(Fj) ?high(Fj)
Chigh(Fj) ? Cmedium(Fj) ?medium(Fj) 0.5
(Chigh(Fj) ? Clow(Fj)) Mj mean of the pattern
points along jth axis. Mjl mean of points in
the range Fj min, mj) Mjh mean of points in
the range (mj, Fj max Fj max, Fj min maximum
and minimum values of feature Fj.
97

An n-dimensional pattern Fi Fi1, Fi2, , Fin
is represented as a 3n-dimensional fuzzy
linguistic pattern Pal Mitra 1992
Fi ?low(Fi1) (Fi), , ?high(Fin) (Fi)
Set m value at 1 or 0, if it is higher or lower
than 0.5
? Binary 3n-dimensional patterns are obtained

(Compute the frequency nki of occurrence of
binary patterns. Select those patterns having
frequency above a threshold Tr (for noise
removal))
Generate a decision table consisting of the
binary patterns.
Extract dependency rules corresponding to
informative regions (blocks).
(e.g., class L1 ? M2)

99
Rough Set Rule Generation
Decision Table
Object F1 F2 F3 F4 F5
Decision x1 1 0 1 0
1 Class 1 x2 0 0 0 0
1 Class 1 x3 1 1 1
1 1 Class 1 x4 0 1 0
1 0 Class 2 x5 1 1
1 0 0 Class 2
100
Discernibility Matrix (c) for Class 1
Objects x1 x2 x3
x1 f F1, F3 F2, F4
x2 f F1,F2,F3,F4
x3 f
101
Discernibility function considering the object x1
belonging to Class 1 Discernibility of x1 w.r.t
x2 (and) Discernibility of x1 w.r.t x3
Similarly, Discernibility function considering
object
Dependency Rules (AND-OR form)
102
Mapping Dependency Rules to Cases

Each conjunction e.g., L1 ? M2 represents a
region (block)
For each conjunction, store as a case
Parameters of the fuzzy membership functions
corresponding to linguistic variables that occur
in the conjunction.
(thus, multiple cases may be generated from a
rule.)

Class information

103

Note All features may not occur in a rule.
Cases may be represented by Different Reduced
number of features.
Structure of a Case
Parameters of the membership functions (center,
radii), Class information

104
Example IEEE
Trans. Knowledge Data Engg., to appear
F2
CASE 1
0.9
0.4
X X X X X X X X X

CASE 2
0.2
0.7
0.1
0.5
F1
Parameters of fuzzy linguistic sets low, medium,
high
105
Dependency Rules and Cases Obtained
Case 1 Feature No 1, fuzzset (L) c 0.1, ?
0.5 Feature No 2, fuzzset (H) c 0.9, ?
0.5 Class1 Case 2 Feature No 1, fuzzset (H)
c 0.7, ? 0.4 Feature No 2, fuzzset (L) c
0.2, ? 0.5 Class2
106
Case Retrieval

Similarity (sim(x,c)) between a pattern x and a
case c is defined as
n number of features present in case c

107

the degree of belongingness of pattern x to
fuzzy linguistic set fuzzset for feature j.
For classifying an unknown pattern, the case
closest to the pattern in terms of sim(x,c) is
retrieved and its class is assigned to the
pattern.

108
Experimental Results and Comparisons

Forest Covertype Contains 10 dimensions, 7
classes and 586,012 samples. It is a Geographical
Information System data representing forest cover
types (pine/fir etc) of USA. The variables are
cartographic and remote sensing measurements. All
the variables are numeric.

109

Multiple features This dataset consists of
features of handwritten numerals (0-9)
extracted from a collection of Dutch utility
maps. There are total 2000 patterns, 649 features
(all numeric) and all classes.
Iris The dataset contains 150 instances, 4
features and 3 classes of Iris flowers. The
features are numeric.

110

Some Existing Case Selection Methods
k-NN based
Condensed nearest neighbor (CNN),
Instance based learning (e.g., IB3),
Instance based learning with feature weighting
(e.g., IB4).
Fuzzy logic based
Neuro-fuzzy based.

111
Algorithms Compared

Instance based learning algorithm, IB3 Aha
1991,
Instance based learning algorithm, IB4 Aha 1992
(reduced feature). The feature weighting is
learned by random hill climbing in IB4. A
specified number of features having high weights
is selected.
Random case selection.

112
Evaluation in terms of

1-NN classification accuracy using the cases.
Training set 10 for case generation, and Test
set 90
Number of cases stored in the case base.
Average number of features required to store a
case (navg).

CPU time required for case generation (tgen).
Average CPU time required to retrieve a case
(tret). (on a Sun UltraSparc _at_350 MHz Workstation)

113
Iris Flowers 4 features, 3 classes, 150 samples
Number of cases 3 (for all methods)
114
Forest Cover Types 10 features, 7 classes,
5,86,012 samples
Number of cases 545 (for all methods)
115
Hand Written Numerals 649 features, 10 classes,
2000 samples
Number of cases 50 (for all methods)
116
For same number of cases
Accuracy Proposed method much superior to random
selection and IB4, close IB3. Average Number of
Features Stored Proposed method stores much less
than the original data dimension. Case Generation
Time Proposed method requires much less compared
to IB3 and IB4. Case Retrieval Time Several
orders less for proposed method compared to IB3
and random selection. Also less than IB4.
117

Conclusions
Relation between Soft Computing, Machine
Intelligence and Pattern Recognition is
explained.
Emergence of Data Mining and Knowledge
Discovery from PR point of view is explained.
Significance of Hybridization in Soft Computing
paradigm is illustrated.

118

Modular concept enhances performance,
accelerates training and makes the network
structured with less no. of links.
Rules generated are superior to other related
methods in terms of accuracy, coverage, fidelity,
confusion, size and certainty.

119

Rough sets used for generating information
granules.
Fuzzy sets provide efficient granulation of
feature space (F -granulation).
Reduced and variable feature subset
representation of cases is a unique feature of
the scheme.
Rough-fuzzy case generation method is suitable
for CBR systems involving datasets large both in
dimension and size.

120