Chapter 5 Statistical Methods - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

Chapter 5 Statistical Methods

Description:

Descriptive statistics V.S. Statistical inference. Population, Sample, Data set ... in the extent of support for abortion between the male and the female population? ... – PowerPoint PPT presentation

Number of Views:440

Avg rating:3.0/5.0

Slides: 48

Provided by: MISn9

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 5 Statistical Methods

1
Chapter 5 Statistical Methods

??????? ??
?????????????

2
Outline

5.1 STATISTICAL INFERENCE
5.2 ASSESSING DIFFERENCES IN DATA SETS
5.3 BAYESIAN INFERENCE
5.4 PREDICTIVE REGRESSION
5.5 ANALYSIS OF VARIANCE
5.6 LOGISTIC REGRESSION
5.7 LOG-LINEAR MODELS
5.8 LINEAR DISCRIMINANT ANALYSIS

3
5.1 STATISTICAL INFERENCE

Descriptive statistics V.S
Statistical inference
Population, Sample, Data set
Parameter V.S Statistic
Inference methods estimation, and tests of
hypotheses

4
5.1 STATISTICAL INFERENCE (cont.)

Estimation The goal is to gain information from
a data set T in order to estimate one or more
parameters w belonging to the model of the
real-world system f(X, w)

5
5.1 STATISTICAL INFERENCE (cont.)

statistical testing to decide whether a
hypothesis concerning the value of the population
characteristic should be accepted or rejected
null hypothesis V.S alternative hypothesis

6
5.2 ASSESSING DIFFERENCES IN DATA SETS

central tendency
1
2
3

7
5.2 ASSESSING DIFFERENCES IN DATA SETS (cont.)

data dispersion
1
2
p.95 ??

8
5.2 ASSESSING DIFFERENCES IN DATA SETS (cont.)

Boxplot
In many statistical software tools, a popularly
used visualization tool of descriptive
statistical measures for central tendency and
dispersion

9
5.3 BAYESIAN INFERENCE

Naïve Bayesian Classification Process (Simple
Bayesian Classified)
?????????,?????????????????????
???????(?????)
P(H/X)????
P(H)????

10
5.3 BAYESIAN INFERENCE (cont.)

Given an additional data sample X (its class is
unknown), it is possible to predict the class for
X using the highest conditional probability
P(Ci/X)
P(X)constant for all classes,only the product
P(X/Ci) P(Ci) needs to be maximized
P(Ci)Ci/m (m is total number of training
samples)

11
5.3 BAYESIAN INFERENCE -example
Table 5.1 Training data set for a classification
using Naïve Bayesian Classifier
12
5.3 BAYESIAN INFERENCE example (cont.)

Goalto predict classification of the new sample
X 1, 2, 2, class ?
maximize the product P(X/Ci) P(Ci) for i 1,2
Step1compute prior probabilities P(Ci)

13
5.3 BAYESIAN INFERENCE example (cont.)

Step2compute conditional probabilities P(xt/Ci)
for every attribute value given in the new sample
X 1, 2, 2, C ?

14
5.3 BAYESIAN INFERENCE example (cont.)

Step3Under the assumption of conditional
independence of attributes, compute conditional
probabilities P(X/Ci)

15
5.3 BAYESIAN INFERENCE example (cont.)

Finally multiplying these conditional
probabilities with corresponding priori
probabilities
obtain values proportional (?) to P(Ci/X) and
find the maximum

16
5.4 PREDICTIVE REGRESSION

The prediction of continuous values can be
modeled by a statistical technique called
regression.
Regression analysis is the process of determining
how a variable Y is related to one or more other
variables X1, X2, , Xn.

Modeling this type of relationship is often
called linear regression.
The relationship that fits a set of data is
characterized by a prediction model called a
regression equation. The most widely used form of
the regression model is the general linear model
formally written as
Yaß1X1ß2X2 ß3X3 ßnXn

18
Simple regression

Simple regressionY a ßX
SSE
a and ß

19
Multiple regression

Multiple regression
Y a ß1X1 ß2X2 ß3X3 ßnXn
SSE (Y - ß.X) . (Y - ß.X)
d(SSE)/dß0
?ß(X.X)-1(X.Y)

20
correlation coefficient
21
correlation coefficient (cont.)

A correlation coefficient r 0.85 indicates a
good linear relationship between two variables.
Additional interpretation is possible. Because r2
0.72, we can say that approximately 72 of the
variations in the values of Y is accounted for by
a linear relationship with X.

22
5.5 ANALYSIS OF VARIANCE

Often the problem of analyzing the quality of the
estimated regression line and the influence of
the independent variables on the final regression
is handled through an analysis-of-variance
approach.

The size of the residuals, for all m samples in a
data set, is related to the size of variance s2
and it can be estimated by
The numerator is called the residual sum while
the denominator is called the residual degree of
freedom.

The criteria are basic decision steps in the
ANOVA algorithm in which we analyze the influence
of input variables on a final model.
First, we start with all inputs and compute S2
for this model. Then, we omit inputs from the
model one by one.
F S2new / S2old

25
(No Transcript)
26
Multivariate analysis of variance

Multivariate analysis of variance is a
generalization of the previously explained ANOVA
analysis.
Yjaß1X1jß2X2j ß3X3jßnXnjej
J1,2,,m
The corresponding residuals for each dimension
will be (Yj - Yj').

Classical multivariate analysis also includes the
method of principal component analysis. This
method has been explained in Chapter 3 when we
were talking about data reduction and data
transformation as preprocessing phases for data
mining.

28
5.6 Logistic Regression

The probability of some event occurring as a
linear function of a set of predictor variables.
Try to estimate the probability p that the
dependent variable will have a given value.
The output variable of the model is defined as a
binary categorical.
p( y 1 ) p, p( y 0 ) 1 - p

29
Logistic Regression(cont.)
The logit form of output is to prevent the
predicting probability pj from going out of range
0,1.
Success probability
failure probability
30
Logistic Regression(ex.)

logit(p) 1.5 - 0.6x1 0.4x2 0.3x3
Input value x1,x2,x3 1,0,1

?????
31
Logistic Regression(ans.)

logit(p) 1.5-0.610.40-0.31
0.6
ln(p/(1-p)) 0.6
p e0.6/(1 e0.6) 0.65
y1, p 0.65
y0, 1 p 0.35

32
5.7 Log-Linear Models

Log-linear modeling is a way of analyzing the
relationship between categorical.
The log-linear model approximates discrete,
multidimensional probability distribution.
All given variables are categorical.
A date set is defined without output variables.

33
Log-Linear Models(cont.)

The aim in log-linear modeling is to identify
associations between categorical variables.
A problem of finding out which of all ßs are 0
in the model.
Correspondence analysis

34
Correspondence analysis

Correspondence analysis represents the set of
categorical data for analysis within incident
matrices, also called contingency table(???).
The result of an analysis of the contingency
table answers the question
?Is there a relationship between analyzed
attributes or not?

35
Algorithm

Transform a given contingency table into a table
with expected values.
Compare these two matrices using the chi-square
test as criteria of association for two
categorical variables.

36
Algorithm(cont.)

d.f.(degree of freedom)
(m-1)(n-1)
T(a) ?2 (d.f., a)
if?2 T(a), H0 is rejected
Otherwise, H0 is accepted

37
Log-Linear Models(ex.)

Are there any differences in the extent of
support for abortion between the male and the
female population?

38
Log-Linear Models(ans.)

This question may be translated into
?Whats the level of dependency between the two
given attributessex and support?
step1

H0sex?support?? H1sex?support??
E11500628/1100 285.5
39
Log-Linear Models(ans.)

Step2
?2
(309-285.5)2/285.5 (191-214.5)2/214.5
(319-342.5)2/342.5 (281-257.5)2/257.5
8.2816

40
Log-Linear Models(ans.)

d.f. (2-1)(2-1) 1
T(0.05) ?2 (0.05,1) 3.84
?2 8.2816 3.84 ? H0 is rejected

41
5.8 Linear Discriminant Analysis

LDA is concerned with classification problems
where the dependent variable is categorical and
the independent variables are metric.
The objective of LDA is to construct a
discriminant function that yields different
scores when computed with data from different
output classes.

42
Discriminant Function
zdiscriminant score, xindependent variable,
wweight

The discriminant function z is used to predict
the class of a new nonclassified sample.

43
Cutting score(???)

Cutting score serve as the criteria against which
each individual discriminant score is judged.
The choice of cutting scores depend upon a
distribution of sample in classes.

44
Cutting score(cont.)

When the two classes of sample are of equal size
and are distribution with uniform variance.
zcut-ab ( za zb ) / 2
zamean discriminant scores of class A
zbmean discriminant scores of class B
A new sample will be classified to one or another
class depending on its score
z gt zcut-ab or z lt zcut-ab .

45
Cutting score(cont.)

A weighted average of mean discriminant scores is
used as an optimal cutting score when the set of
sample for each of the classes are not equal
size.
zcut-ab(nazanbzb) / (nanb)
zamean discriminant scores of class A
zbmean discriminant scores of class B
nanumber of sample of class A
nbnumber of sample of class B

46
Multiple Discriminant Analysis

Multiple discriminant analysis is used in
situations when separate discriminant function
are constructed for each class.
Decide in favor of the class whose discriminant
score is the hightest.

47
QA

Write a Comment

User Comments (0)