This chapter uses MS Excel and Weka - PowerPoint PPT Presentation

1 / 171

About This Presentation

Title:

This chapter uses MS Excel and Weka

Description:

A Supervised technique that generalizes a set of numeric data by ... Bagging. Boosting. Instance Typicality. Part IV. Intelligent Systems. Rule-Based Systems ... – PowerPoint PPT presentation

Number of Views:279

Avg rating:3.0/5.0

Slides: 172

Provided by: Richard1376

Category:

more less

Transcript and Presenter's Notes

Title: This chapter uses MS Excel and Weka

1

This chapter uses MS Excel and Weka

2
Statistical Techniques

Chapter 10

3
10.1 Linear Regression Analysis
4
10.1 Linear Regression Analysis

A Supervised technique that generalizes a set of
numeric data by creating a math equation relating
one or more ,nput variables to a single output
variable.
With linear regression we attemp to model
vairation in a dependent variable as a linear
combination of one or more independent variable
Linear regression is appro when the relation
betwee the dependent and the independent
variables are nearly linear

5
Simple Linear Regression(slope-intercept form)
6
Simple Linear Regression(least squares criterion)
7
Multiple Linear Regression with Excel
8
Try to estimate the value of a building
9
A Regression Equation for the District Office
Building Data
10
(No Transcript)
11
10.1 Linear Regression Analysis

How accurate are the results
Use scatterplot diagram, and the line for the
formula
Which ind vars are linearly related to dep vars.
Use the stats?
Coefficient determination1, no difference
between actual (in the table) and computed values
for dependent variable.(reps corrolation between
actual and computed values)
Standard error for the estimate of dep var.

12
F stat for the regression analysis

Used to establish, if the coeff. deter. Is
significant.
Look up f critical values (459) from one-tailed F
tables in stat books using v1(number of ind vars,
4), v2 (no of instance no of vars, 11-56)
Regression equation is able to correctly
determine assesed values of office buildings that
are part of the training data

13
(No Transcript)
14
Regression Trees
15
(No Transcript)
16
Regression Tree

Essentially a desicion tree with leaf node with
numeric variables
The value at an individual leaf node is numeric
average of the output attribute for all instances
passing through the tree to the leaf node
posititon
Regresion trees are more accurate than linear
regression, when data is nonlinear
But is more difficult to interpret
Sometime regression trees are combined with
linear regression to form model trees

17
Model Trees

Regression tree linear regression
Each leaf node represents a linear regression
quation instead of an average value
Model trees simplify regession trees by reducing
the number of nodes in the tree.
More complex tree means less linear relationship
between dep and ind vars.

18
(No Transcript)
19
10.2 Logistic Regression
20
Logistic Regression

Using linear regresion to model problems with
observed outcome restricted to 2 values (e.g.
yes/no) is sriously flawed. Value restriction
placed on output var is not observed in the
regression equation, Linear regression produce
straight line unbounded onboth ends.
Therefor the linear equation must be transform to
restric output to 0,1, Thus regression equation
can be thought of as producing a probablity of
occurence or nonoccurence of a measured event.
Logistic regression applies logaithmic transform.

21
Transforming the Linear Regression Model

Logistic regression is a nonlinear regression
technique that associates a conditional
probability with each data instance.
1 denotes observaton of one class (yes)
0 denotes observation of another class (no)
Thus a conditional proabality of seeing class
associatied with y1 (yes) p(y1x), given the
values in the feature vector x

22
The Logistic Regression Model
Determine the coefficients in x, (axc) using an
iterative method (tries to minimize the sum of
logarithms of predicted probablities) Convergence
occurs when logarithmic summation is close to 0
or when it doesnt change from iteration to
iteration
23
(No Transcript)
24
Logistic Regression An Example
Credit card Example CreditCardPromotionNet
file. LifeIns Pro is output
CreditCardIns and Sex are most influantion
attribs.
25
(No Transcript)
26
Logistic Regression

Classify a new instance using logistic regression
income35K
Credit card insurance1
Sex0
Age39
P(y1x)0.999

27
10.3 Bayes Classifier

Supervised classification tech, categorical
output attrib
All input vars are independent, of equal
importance

P(HE) likelihood of H (dependent var
representing a predicted class)
P(EH) conditional probability of H is true given
evidence E (computed from training data)
P(H) apriori probability, denotes probability of
H before the presentation of evidence E (computed
from training data)

28
Bayes Classifier An Example
Credit card promotion data set Sex is output
29
(No Transcript)
30
The Instance to be Classified

Magazine Promotion Yes
Watch Promotion Yes
Life Insurance Promotion No
Credit Card Insurance No
Sex ?
2 hypothesis, sexfemale, sexmale

31
(No Transcript)
32
Computing The Probability For Sex Male
33
Conditional Probabilities for Sex Male

P(magazine promotion yes sex
male) 4/6
P(watch promotion yes sex male) 2/6
P(life insurance promotion no sex male)
4/6
P(credit card insurance no sex male) 4/6
P(E sex male) (4/6) (2/6) (4/6) (4/6)
8/81

34
The Probability for SexMale Given Evidence E

P(sex male E) ? 0.0593 / P(E)

35
The Probability for SexFemale Given Evidence E

P(sex female E) ? 0.0281 / P(E)
P(sex male E) gt P(sex female E)
The instance is most likely a male credit card
customer

36
Zero-Valued Attribute Counts
Problem with Bayes is when of the counts are 0,
to solve this problem a small constant to
numerator/dominator n/d becomes
k is 0.5 for an attrib with 2 possible
values Example P(E sex male)
(3/4)(2/4)(1/4)(3/4) 9/128 P(E sex male)
(3.5/5)(2.5/5)(1.5/5)(3.5/5) 0.0176
37
(No Transcript)
38
Missing Data

With Bayes classifier missing data items are
ignored.

39
Missing Data

Example

40
Numeric Data
41
Numeric Data
Probability Density Function, (attribute values
are assumed to be normally distributed)

where
e the exponential function
m the class mean for the given numerical
attribute
s the class standard deviation for the
attribute
x the attribute value

42
Numeric Data

Magazine Promotion Yes
Watch Promotion Yes
Life Insurance Promotion No
Credit Card Insurance No
Age 45
Sex ?
P(Esexmale) . P(age45sexmale)
s 7.69 ? 37, x45
P(age45sexmale) 1/(.) 0.03
P(sexmaleE) 0.0018/P(E)
P(sexfemaleE) 0.0016/P(E)
Instance belong to male

43
10.4 Clustering Algorithms
44
Agglomerative Clustering

Place each instance into a separate partition.
Until all instances are part of a single cluster
a. Determine the two most similar clusters.
b. Merge the clusters chosen into a single
cluster.
3. Choose a clustering formed by one of the step
2 iterations as a final result.

45
Agglomerative Clustering An Example
46
(No Transcript)
47
(No Transcript)
48
Agglomerative Clustering

Final step of the Algorithm is to choose final
clustering among all. (Requires heuristics)
Use similarity measure for creating clusters,
compare average within-cluster similarity with
overall similarity of all instances in dataset
(domain similarity)
This technique can be best used to eliminate
clusterings rather than to choose a final result

49
Agglomerative Clustering

Final step of the Algorithm is to choose final
clustering among all. (Requires heuristics)
Use within-cluster similarity measure and
within-cluster similarities of pairwise-combined
clusters in the cluster set. Look for the highest
similarity
This technique can be best used to eliminate
clusters rather than to choose a final result

50
Agglomerative Clustering

Final step of the Algorithm is to choose final
clustering among all. (Requires heuristics)
Use previous 2 techniques to eliminate some of
the clusterings
Feed each remaining clustering to a rule
generator
The clustering with best defining rules is
chosen.
(4th tech) Bayesian Information Criterion

51
Conceptual Clustering

Create a cluster with the first instance as its
only member.
For each remaining instance, take one of two
actions at each tree level.
a. Place the new instance into an existing
cluster.
b. Create a new concept cluster having the new
instance as its only member.

52
Data for Conceptual Clustering
53
(No Transcript)
54
Expectation Maximization

Guess initial values for the five parameters.
Until a termination criterion is achieved
a. Use the probability density function for
normal distributions to compute the cluster
probability for each instance.
b. Use the probability scores assigned to each
instance in step 2(a) to re-estimate the
parameters.

55
The EM Algorithm An Example
56
(No Transcript)
57
10.5 Heuristics or Statistics?
58
Query and Visualization Techniques

Query tools
OLAP tools
Visualization tools

59
Machine Learning and Statistical Techniques

Statistical techniques typically assume an
underlying distribution for the data whereas
machine learning techniques do not.
Machine learning techniques tend to have a human
flavor.
Machine learning techniques are better able to
deal with missing and noisy data.
Most machine learning techniques are able to
explain their behavior.
Statistical techniques tend to perform poorly
with large-sized data.

60
Specialized Techniques

Chapter 11

61
11.1 Time-Series Analysis

Time-series Problems Prediction applications
with one or more time-dependent attributes.

62
An Example with Linear Regression

The Stock Index Dataset

63
(No Transcript)
64
Linear Regression Equations for the Stock Index
Dataset
65
(No Transcript)
66
A Neural Network Example
67
(No Transcript)
68
Categorical Attribute Prediction
69
(No Transcript)
70
General Considerations

Test and modify created models as new data
becomes available.
Try one or more data transformations if less
than optimal results are obtained.
Exercise caution when predicting future outcome
with training data having several predicted
fields.
Try a nonlinear model if a linear model offers
poor results.
Use unsupervised clustering to determine if
input attribute values allow the output
attribute to cluster into meaningful categories.

71
11.2 Mining the Web
72
Web-Based Mining General Issues

Clickstreams
Extended Common Log File Format
Session Files
User Sessions
Pageviews
Cookies

73
(No Transcript)
74
Data Mining for Web Site Evaluation

Sequence miners are special data mining programs
able to discover frequently accessed Web pages
that occur in the same order.

75
Data Mining for Personalization
76
(No Transcript)
77
(No Transcript)
78
Data Mining for Web Site Adaptation

The index synthesis problem Given a Web site and
a visitor access log, create new index pages
containing collections of links to related but
currently unlinked pages.

79
11.3 Mining Textual Data

Train Create an attribute dictionary.
Filter Remove common words.
Classify Classify new documents.

80
11.4 Improving Performance

Bagging
Boosting
Instance Typicality

81
(No Transcript)
82
(No Transcript)
83
Part IV

Intelligent Systems

84
Rule-Based Systems

Chapter 12

85
12.1 Exploring Artificial Intelligence
86
(No Transcript)
87
Nearest Neighbor Heuristic

When conducting a state-space search, always move
to the next closest state.

88
(No Transcript)
89
(No Transcript)
90
The Water Jug Problem
91
(No Transcript)
92
(No Transcript)
93
Depth-First SearchA-B-E-F-C-G-I-J-H-D
94
Breadth-First SearchA-B-C-D-E-F-G-H-I-J
95
(No Transcript)
96
(No Transcript)
97
(No Transcript)
98
Backward Chaining

Creating a Goal Tree

99
(No Transcript)
100
Expert Systems
101
(No Transcript)
102
Developing an Expert System
103
(No Transcript)
104
Structuring A Rule-Based System

Form 1040 Tax Dependency

105
(No Transcript)
106
(No Transcript)
107
Choosing a Data Mining Technique
108
(No Transcript)
109
Managing Uncertaintyin Rule-Based Systems

Chapter 13

110
13.1 Uncertainty Sources and Solutions
111
Sources of Uncertainty

Rule 1Large Package Rule
IF package size is large
THEN send package UPS

112
Sources of Uncertainty

Rule Antecedent
Rule Confidence
Combining Uncertain Information

113
General Methods for Dealing with Uncertainty

Probability-Based Methods
Heuristic Methods

114
Probability-Based Methods

Objective Probability
Experimental Probability
Subjective Probability

115
Heuristic Methods

Certainty Factors
Fuzzy Logic

116
13.2 Fuzzy Rule-Based Systems
117
Fuzzy Sets

A set associated with a linguistic value that
gives the degree of membership for a numerical
value.

118
(No Transcript)
119
Fuzzy Reasoning An Example

Fuzzification
Rule Inference
Rule Composition
Defuzzification

120
(No Transcript)
121
(No Transcript)
122
(No Transcript)
123
13.3 A Probability-Based Approach to Uncertainty
124
Bayes Theorem
125
(No Transcript)
126
Multiple Evidence with Bayes Theorem
127
Multiple Evidence with Bayes Theorem
128
Likelihood RatiosNecessity and Sufficiency
129
General Considerations

P(HE) P(HE) must sum to 1.
Conditional independence between multiple
pieces of evidence must be assumed.
Prior Probabilities are often unobtainable.
Large amounts of data must be gathered to obtain
reasonable estimates for conditional
probabilities.

130
Intelligent Agents

Chapter 14

131
14.1 Characteristics of Intelligent Agents

Situatedness
Autonomy
Adaptivity
Sociability

132
14.2 Types of Agents

Anticipatory agents
Filtering agents
Semiautonomous agents
Find-and-retrieve agents
User agents
Monitor and Surveillance agents
Data Mining agents
Proactive agents
Cooperative agents

133
14.3 Integrating Data Mining, Expert Systems and
Intelligent Agents
134
(No Transcript)
135
The iDA Software

Appendix A

136
Datasets for Data Mining

Appendix B

137
Decision Tree Attribute Selection

Appendix C

138
Computing Gain Ratio
139
Computing Gain(A)
140
Computing Info(I)
141
Computing Info(I,A)
142
Computing Split Info(A)
143
(No Transcript)
144
(No Transcript)
145
Statistics for Performance Evaluation

Appendix D

146
D.1 Single-Valued Summary Statistics
147
Computing the Mean
where µ is the mean value n is the number of data
items xi is the ith data item
148
Computing the Variance

where s2 is the variance µ is the population
mean n is the number of data items xi is the ith
data item
149
D.2 The Normal Distribution
150
The Normal Curve
where f(x) is the height of the curve
corresponding to values of x e is the base of
natural logarithms approximated by 2.718282 m is
the arithmetic mean for the data s is the
standard deviation
151
D.3 Comparing Supervised Learner Models

Comparing Models with Independent Test Data
Pairwise Comparison with a Single Test Set

152
Comparing Models with Independent Test Data
Two independent test sets, set A containing n1
elements and set B with n elements Error rate E1
and variance v1 for model M1 on test set A Error
rate E2 and variance v2 for model M2 on test set B
153
Pairwise Comparison with a Single Test Set
154
Computing Joint Variance for a Single Test Set
155
Pairwise Comparison with a Single Test Set
156
D.4 Confidence Intervals for Numeric Output
157
D.5 Comparing Models with Numeric Output

Independent Test Sets
Pairwise Comparison with a Single Test Set
Overall Comparison with a Single Test Set

158
Comparing Models with Independent Test Sets
where mae1 is the mean absolute error for model
M1 mae2 is the mean absolute error for model
M2 v1 and v2 are variance scores associated with
M1 and M2 n1 and n2 are the number of instances
within each respective test set
159
Pairwise Comparison with a Single Test Set
where mae1 is the mean absolute error for
model M1 mae2 is the mean absolute error for
model M2 V12 is the joint variance computed
with the formula defined in Equation D.5 n is
the number of test set instances
160
Overall Comparison with a Single Test Set
where maej is the mean absolute error for model
j ei is the absolute value of the computed value
minus the actual value for instance i n is the
number of test set instances
161
Overall Comparison with a Single Test Set
where ? is either the average or the larger of
the variance scores for each model n is the
total number of test set instances
162
Excel Pivot Tables Office 97

Appendix E

163
(No Transcript)
164
(No Transcript)
165
(No Transcript)
166
(No Transcript)
167
(No Transcript)
168
(No Transcript)
169
(No Transcript)
170
(No Transcript)
171
(No Transcript)

Write a Comment

User Comments (0)