Title: Chapter 5 Statistical Methods
1Chapter 5 Statistical Methods
2Outline
- 5.1 STATISTICAL INFERENCE
- 5.2 ASSESSING DIFFERENCES IN DATA SETS
- 5.3 BAYESIAN INFERENCE
- 5.4 PREDICTIVE REGRESSION
- 5.5 ANALYSIS OF VARIANCE
- 5.6 LOGISTIC REGRESSION
- 5.7 LOG-LINEAR MODELS
- 5.8 LINEAR DISCRIMINANT ANALYSIS
35.1 STATISTICAL INFERENCE
- Descriptive statistics V.S
- Statistical inference
- Population, Sample, Data set
- Parameter V.S Statistic
- Inference methods estimation, and tests of
hypotheses
45.1 STATISTICAL INFERENCE (cont.)
- Estimation The goal is to gain information from
a data set T in order to estimate one or more
parameters w belonging to the model of the
real-world system f(X, w)
55.1 STATISTICAL INFERENCE (cont.)
- statistical testing to decide whether a
hypothesis concerning the value of the population
characteristic should be accepted or rejected - null hypothesis V.S alternative hypothesis
65.2 ASSESSING DIFFERENCES IN DATA SETS
75.2 ASSESSING DIFFERENCES IN DATA SETS (cont.)
- data dispersion
- 1
- 2
- p.95 ??
85.2 ASSESSING DIFFERENCES IN DATA SETS (cont.)
- Boxplot
- In many statistical software tools, a popularly
used visualization tool of descriptive
statistical measures for central tendency and
dispersion
95.3 BAYESIAN INFERENCE
- Naïve Bayesian Classification Process (Simple
Bayesian Classified) - ?????????,?????????????????????
- ???????(?????)
- P(H/X)????
- P(H)????
105.3 BAYESIAN INFERENCE (cont.)
- Given an additional data sample X (its class is
unknown), it is possible to predict the class for
X using the highest conditional probability
P(Ci/X) - P(X)constant for all classes,only the product
P(X/Ci) P(Ci) needs to be maximized - P(Ci)Ci/m (m is total number of training
samples)
115.3 BAYESIAN INFERENCE -example
Table 5.1 Training data set for a classification
using Naïve Bayesian Classifier
125.3 BAYESIAN INFERENCE example (cont.)
- Goalto predict classification of the new sample
X 1, 2, 2, class ? - maximize the product P(X/Ci) P(Ci) for i 1,2
- Step1compute prior probabilities P(Ci)
135.3 BAYESIAN INFERENCE example (cont.)
- Step2compute conditional probabilities P(xt/Ci)
for every attribute value given in the new sample
X 1, 2, 2, C ?
145.3 BAYESIAN INFERENCE example (cont.)
- Step3Under the assumption of conditional
independence of attributes, compute conditional
probabilities P(X/Ci) -
155.3 BAYESIAN INFERENCE example (cont.)
- Finally multiplying these conditional
probabilities with corresponding priori
probabilities - obtain values proportional (?) to P(Ci/X) and
find the maximum
165.4 PREDICTIVE REGRESSION
- The prediction of continuous values can be
modeled by a statistical technique called
regression. - Regression analysis is the process of determining
how a variable Y is related to one or more other
variables X1, X2, , Xn.
17- Modeling this type of relationship is often
called linear regression. - The relationship that fits a set of data is
characterized by a prediction model called a
regression equation. The most widely used form of
the regression model is the general linear model
formally written as - Yaß1X1ß2X2 ß3X3 ßnXn
18Simple regression
- Simple regressionY a ßX
- SSE
- a and ß
19Multiple regression
- Multiple regression
- Y a ß1X1 ß2X2 ß3X3 ßnXn
- SSE (Y - ß.X) . (Y - ß.X)
- d(SSE)/dß0
- ?ß(X.X)-1(X.Y)
20correlation coefficient
21correlation coefficient (cont.)
- A correlation coefficient r 0.85 indicates a
good linear relationship between two variables.
Additional interpretation is possible. Because r2
0.72, we can say that approximately 72 of the
variations in the values of Y is accounted for by
a linear relationship with X.
225.5 ANALYSIS OF VARIANCE
- Often the problem of analyzing the quality of the
estimated regression line and the influence of
the independent variables on the final regression
is handled through an analysis-of-variance
approach.
23- The size of the residuals, for all m samples in a
data set, is related to the size of variance s2
and it can be estimated by - The numerator is called the residual sum while
the denominator is called the residual degree of
freedom.
24- The criteria are basic decision steps in the
ANOVA algorithm in which we analyze the influence
of input variables on a final model. - First, we start with all inputs and compute S2
for this model. Then, we omit inputs from the
model one by one. - F S2new / S2old
25(No Transcript)
26Multivariate analysis of variance
- Multivariate analysis of variance is a
generalization of the previously explained ANOVA
analysis. - Yjaß1X1jß2X2j ß3X3jßnXnjej
- J1,2,,m
- The corresponding residuals for each dimension
will be (Yj - Yj').
27- Classical multivariate analysis also includes the
method of principal component analysis. This
method has been explained in Chapter 3 when we
were talking about data reduction and data
transformation as preprocessing phases for data
mining.
285.6 Logistic Regression
- The probability of some event occurring as a
linear function of a set of predictor variables. - Try to estimate the probability p that the
dependent variable will have a given value. - The output variable of the model is defined as a
binary categorical. - p( y 1 ) p, p( y 0 ) 1 - p
29Logistic Regression(cont.)
The logit form of output is to prevent the
predicting probability pj from going out of range
0,1.
Success probability
failure probability
30Logistic Regression(ex.)
- logit(p) 1.5 - 0.6x1 0.4x2 0.3x3
- Input value x1,x2,x3 1,0,1
?????
31Logistic Regression(ans.)
- logit(p) 1.5-0.610.40-0.31
- 0.6
- ln(p/(1-p)) 0.6
- p e0.6/(1 e0.6) 0.65
- y1, p 0.65
- y0, 1 p 0.35
325.7 Log-Linear Models
- Log-linear modeling is a way of analyzing the
relationship between categorical. - The log-linear model approximates discrete,
multidimensional probability distribution. - All given variables are categorical.
- A date set is defined without output variables.
33Log-Linear Models(cont.)
- The aim in log-linear modeling is to identify
associations between categorical variables. - A problem of finding out which of all ßs are 0
in the model. - Correspondence analysis
34Correspondence analysis
- Correspondence analysis represents the set of
categorical data for analysis within incident
matrices, also called contingency table(???). - The result of an analysis of the contingency
table answers the question - ?Is there a relationship between analyzed
- attributes or not?
35Algorithm
- Transform a given contingency table into a table
with expected values. - Compare these two matrices using the chi-square
test as criteria of association for two
categorical variables.
36Algorithm(cont.)
- d.f.(degree of freedom)
- (m-1)(n-1)
- T(a) ?2 (d.f., a)
- if?2 T(a), H0 is rejected
- Otherwise, H0 is accepted
37Log-Linear Models(ex.)
- Are there any differences in the extent of
support for abortion between the male and the
female population?
38Log-Linear Models(ans.)
- This question may be translated into
- ?Whats the level of dependency between the two
given attributessex and support? - step1
H0sex?support?? H1sex?support??
E11500628/1100 285.5
39Log-Linear Models(ans.)
- Step2
- ?2
- (309-285.5)2/285.5 (191-214.5)2/214.5
- (319-342.5)2/342.5 (281-257.5)2/257.5
- 8.2816
40Log-Linear Models(ans.)
- d.f. (2-1)(2-1) 1
- T(0.05) ?2 (0.05,1) 3.84
- ?2 8.2816 3.84 ? H0 is rejected
415.8 Linear Discriminant Analysis
- LDA is concerned with classification problems
where the dependent variable is categorical and
the independent variables are metric. - The objective of LDA is to construct a
discriminant function that yields different
scores when computed with data from different
output classes.
42Discriminant Function
zdiscriminant score, xindependent variable,
wweight
- The discriminant function z is used to predict
the class of a new nonclassified sample.
43Cutting score(???)
- Cutting score serve as the criteria against which
each individual discriminant score is judged. - The choice of cutting scores depend upon a
distribution of sample in classes.
44Cutting score(cont.)
- When the two classes of sample are of equal size
and are distribution with uniform variance. - zcut-ab ( za zb ) / 2
- zamean discriminant scores of class A
- zbmean discriminant scores of class B
- A new sample will be classified to one or another
class depending on its score - z gt zcut-ab or z lt zcut-ab .
45Cutting score(cont.)
- A weighted average of mean discriminant scores is
used as an optimal cutting score when the set of
sample for each of the classes are not equal
size. - zcut-ab(nazanbzb) / (nanb)
- zamean discriminant scores of class A
- zbmean discriminant scores of class B
- nanumber of sample of class A
- nbnumber of sample of class B
46Multiple Discriminant Analysis
- Multiple discriminant analysis is used in
situations when separate discriminant function
are constructed for each class. - Decide in favor of the class whose discriminant
score is the hightest.
47QA