Title: Classification and Prediction
1Classification and Prediction
2Fuzzy Set Approaches
- Fuzzy logic uses truth values between 0.0 and 1.0
to represent the degree of membership (such as
using fuzzy membership graph) - Attribute values are converted to fuzzy values
- e.g., income is mapped into the discrete
categories low, medium, high with fuzzy values
calculated - For a given new sample, more than one fuzzy value
may apply - Each applicable rule contributes a vote for
membership in the categories - Typically, the truth values for each predicted
category are summed
3Fuzzy Sets
- Sets with fuzzy boundaries
A Set of tall people
Fuzzy set A
1.0
.9
Membership function
.5
510
62
Heights
4Membership Functions (MFs)
- Characteristics of MFs
- Subjective measures
- Not probability functions
?tall in Asia
MFs
.8
?tall in the US
.5
.1
510
Heights
5Fuzzy Sets
- Formal definition
- A fuzzy set A in X is expressed as a set of
ordered pairs
Membership function (MF)
Universe or universe of discourse
Fuzzy set
A fuzzy set is totally characterized by
a membership function (MF).
6Fuzzy Sets with Discrete Universes
- Fuzzy set A sensible number of children
- X 0, 1, 2, 3, 4, 5, 6 (discrete universe)
- A (0, .1), (1, .3), (2, .7), (3, 1), (4, .6),
(5, .2), (6, .1)
7Fuzzy Sets with Cont. Universes
- Fuzzy set B about 50 years old
- X Set of positive real numbers (continuous)
- B (x, mB(x)) x in X
8Fuzzy Partition
- Fuzzy partitions formed by the linguistic values
young, middle aged, and old
lingmf.m
9Set-Theoretic Operations
- Subset
- Complement
- Union
- Intersection
10Set-Theoretic Operations
subset.m
fuzsetop.m
11MF Formulation
disp_mf.m
12Fuzzy If-Then Rules
- General format
- If x is A then y is B
- Examples
- If pressure is high, then volume is small.
- If the road is slippery, then driving is
dangerous. - If a tomato is red, then it is ripe.
- If the speed is high, then apply the brake a
little.
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21Classification and Prediction
- Fuzzy
- Support Vector Machine
22Support Vector Machine
- To search the Optimal Separating Hyperplane to
maximize the margin
23Support Vector Machine
- To train SVM is equal to solving a quadratic
programming problem - Test phase
- si support vectors, yi class of si
- K() kernel function, ai b parameters
24Support Vector Machine
- Kernel Function
- K(x,y) ?(x) ?(y)
- x,y are vectors in input space
- ?(x), ?(y) are vectors in feature space
- d (feature space) gtgt d (input space)
- No need to compute ?(x) explicitly
- Tr(x,y) sub(x) sub(y), where sub(x) is a
vector represents all the sub-trees of x. - www.csie.ntu.edu.tw/cjlin
25Classification and Prediction
- Fuzzy
- Support Vector Machine
- Prediction
26What Is Prediction?
- Prediction is similar to classification
- First, construct a model
- Second, use model to predict unknown value
- Major method for prediction is regression
- Linear and multiple regression
- Non-linear regression
- Prediction is different from classification
- Classification refers to predict categorical
class label - Prediction models continuous-valued functions
27Regress Analysis and Log-Linear Models in
Prediction
- Linear regression Y ? ? X
- Two parameters , ? and ? specify the line and
are to be estimated by using the data at hand. - using the least squares criterion to the known
values of Y1, Y2, , X1, X2, . - Multiple regression Y b0 b1 X1 b2 X2.
- Many nonlinear functions can be transformed into
the above. - Log-linear models
- The multi-way table of joint probabilities is
approximated by a product of lower-order tables. - Probability p(a, b, c, d) ?ab ?ac?ad ?bcd
28Locally Weighted Regression
- Construct an explicit approximation to f over a
local region surrounding query instance xq. - Locally weighted linear regression
- The target function f is approximated near xq
using the linear function - minimize the squared error distance-decreasing
weight K - the gradient descent training rule
- In most cases, the target function is
approximated by a constant, linear, or quadratic
function.
29Classification and Prediction
- Fuzzy
- Support Vector Machine
- Prediction
- Classification accuracy
30Classification Accuracy Estimating Error Rates
- Partition Training-and-testing
- use two independent data sets, e.g., training set
(2/3), test set(1/3) - used for data set with large number of samples
- Cross-validation
- divide the data set into k subsamples
- use k-1 subsamples as training data and one
sub-sample as test data --- k-fold
cross-validation - for data set with moderate size
- Bootstrapping (leave-one-out)
- for small size data
31Boosting and Bagging
- Boosting increases classification accuracy
- Applicable to decision trees or Bayesian
classifier - Learn a series of classifiers, where each
classifier in the series pays more attention to
the examples misclassified by its predecessor - Boosting requires only linear time and constant
space
32Boosting Technique (II) Algorithm
- Assign every example an equal weight 1/N
- For t 1, 2, , T Do
- Obtain a hypothesis (classifier) h(t) under w(t)
- Calculate the error of h(t) and re-weight the
examples based on the error - Normalize w(t1) to sum to 1
- Output a weighted sum of all the hypothesis, with
each hypothesis weighted according to its
accuracy on the training set
33Is Accuracy Enough to Judge?
- Sensitivity t_pos/pos
- Specificity t_neg/neg
- Precision t_pos/(t_posf_pos)
34Classification and Prediction
- Decision tree
- Bayesian Classification
- ANN
- KNN
- GA
- Fuzzy
- SVM
- Prediction
- Some issues
35Summary
- Classification is an extensively studied problem
(mainly in statistics, machine learning neural
networks) - Classification is probably one of the most widely
used data mining techniques with a lot of
extensions - Scalability is still an important issue for
database applications thus combining
classification with database techniques should be
a promising topic - Research directions classification of
non-relational data, e.g., text, spatial,
multimedia, etc..