Title: This chapter uses MS Excel and Weka
1- This chapter uses MS Excel and Weka
2Statistical Techniques
310.1 Linear Regression Analysis
410.1 Linear Regression Analysis
- A Supervised technique that generalizes a set of
numeric data by creating a math equation relating
one or more ,nput variables to a single output
variable. - With linear regression we attemp to model
vairation in a dependent variable as a linear
combination of one or more independent variable - Linear regression is appro when the relation
betwee the dependent and the independent
variables are nearly linear
5Simple Linear Regression(slope-intercept form)
6Simple Linear Regression(least squares criterion)
7Multiple Linear Regression with Excel
8Try to estimate the value of a building
9A Regression Equation for the District Office
Building Data
10(No Transcript)
1110.1 Linear Regression Analysis
- How accurate are the results
- Use scatterplot diagram, and the line for the
formula - Which ind vars are linearly related to dep vars.
Use the stats? - Coefficient determination1, no difference
between actual (in the table) and computed values
for dependent variable.(reps corrolation between
actual and computed values) - Standard error for the estimate of dep var.
12F stat for the regression analysis
- Used to establish, if the coeff. deter. Is
significant. - Look up f critical values (459) from one-tailed F
tables in stat books using v1(number of ind vars,
4), v2 (no of instance no of vars, 11-56) - Regression equation is able to correctly
determine assesed values of office buildings that
are part of the training data
13(No Transcript)
14Regression Trees
15(No Transcript)
16Regression Tree
- Essentially a desicion tree with leaf node with
numeric variables - The value at an individual leaf node is numeric
average of the output attribute for all instances
passing through the tree to the leaf node
posititon - Regresion trees are more accurate than linear
regression, when data is nonlinear - But is more difficult to interpret
- Sometime regression trees are combined with
linear regression to form model trees
17Model Trees
- Regression tree linear regression
- Each leaf node represents a linear regression
quation instead of an average value - Model trees simplify regession trees by reducing
the number of nodes in the tree. - More complex tree means less linear relationship
between dep and ind vars.
18(No Transcript)
1910.2 Logistic Regression
20Logistic Regression
- Using linear regresion to model problems with
observed outcome restricted to 2 values (e.g.
yes/no) is sriously flawed. Value restriction
placed on output var is not observed in the
regression equation, Linear regression produce
straight line unbounded onboth ends. - Therefor the linear equation must be transform to
restric output to 0,1, Thus regression equation
can be thought of as producing a probablity of
occurence or nonoccurence of a measured event. - Logistic regression applies logaithmic transform.
21Transforming the Linear Regression Model
- Logistic regression is a nonlinear regression
technique that associates a conditional
probability with each data instance. - 1 denotes observaton of one class (yes)
- 0 denotes observation of another class (no)
- Thus a conditional proabality of seeing class
associatied with y1 (yes) p(y1x), given the
values in the feature vector x
22The Logistic Regression Model
Determine the coefficients in x, (axc) using an
iterative method (tries to minimize the sum of
logarithms of predicted probablities) Convergence
occurs when logarithmic summation is close to 0
or when it doesnt change from iteration to
iteration
23(No Transcript)
24Logistic Regression An Example
Credit card Example CreditCardPromotionNet
file. LifeIns Pro is output
CreditCardIns and Sex are most influantion
attribs.
25(No Transcript)
26Logistic Regression
- Classify a new instance using logistic regression
- income35K
- Credit card insurance1
- Sex0
- Age39
- P(y1x)0.999
2710.3 Bayes Classifier
- Supervised classification tech, categorical
output attrib - All input vars are independent, of equal
importance
- P(HE) likelihood of H (dependent var
representing a predicted class) - P(EH) conditional probability of H is true given
evidence E (computed from training data) - P(H) apriori probability, denotes probability of
H before the presentation of evidence E (computed
from training data)
28Bayes Classifier An Example
Credit card promotion data set Sex is output
29(No Transcript)
30The Instance to be Classified
- Magazine Promotion Yes
- Watch Promotion Yes
- Life Insurance Promotion No
- Credit Card Insurance No
- Sex ?
- 2 hypothesis, sexfemale, sexmale
31(No Transcript)
32Computing The Probability For Sex Male
33Conditional Probabilities for Sex Male
- P(magazine promotion yes sex
male) 4/6 - P(watch promotion yes sex male) 2/6
- P(life insurance promotion no sex male)
4/6 - P(credit card insurance no sex male) 4/6
- P(E sex male) (4/6) (2/6) (4/6) (4/6)
8/81 -
34The Probability for SexMale Given Evidence E
- P(sex male E) ? 0.0593 / P(E)
35The Probability for SexFemale Given Evidence E
- P(sex female E) ? 0.0281 / P(E)
- P(sex male E) gt P(sex female E)
- The instance is most likely a male credit card
customer
36Zero-Valued Attribute Counts
Problem with Bayes is when of the counts are 0,
to solve this problem a small constant to
numerator/dominator n/d becomes
k is 0.5 for an attrib with 2 possible
values Example P(E sex male)
(3/4)(2/4)(1/4)(3/4) 9/128 P(E sex male)
(3.5/5)(2.5/5)(1.5/5)(3.5/5) 0.0176
37(No Transcript)
38Missing Data
- With Bayes classifier missing data items are
ignored.
39Missing Data
40Numeric Data
41Numeric Data
Probability Density Function, (attribute values
are assumed to be normally distributed)
- where
- e the exponential function
- m the class mean for the given numerical
attribute - s the class standard deviation for the
attribute - x the attribute value
42Numeric Data
- Magazine Promotion Yes
- Watch Promotion Yes
- Life Insurance Promotion No
- Credit Card Insurance No
- Age 45
- Sex ?
-
- P(Esexmale) . P(age45sexmale)
- s 7.69 ? 37, x45
- P(age45sexmale) 1/(.) 0.03
- P(sexmaleE) 0.0018/P(E)
- P(sexfemaleE) 0.0016/P(E)
- Instance belong to male
4310.4 Clustering Algorithms
44Agglomerative Clustering
- Place each instance into a separate partition.
- Until all instances are part of a single cluster
- a. Determine the two most similar clusters.
- b. Merge the clusters chosen into a single
cluster. - 3. Choose a clustering formed by one of the step
2 iterations as a final result. -
45Agglomerative Clustering An Example
46(No Transcript)
47(No Transcript)
48Agglomerative Clustering
- Final step of the Algorithm is to choose final
clustering among all. (Requires heuristics) - Use similarity measure for creating clusters,
compare average within-cluster similarity with
overall similarity of all instances in dataset
(domain similarity) - This technique can be best used to eliminate
clusterings rather than to choose a final result
49Agglomerative Clustering
- Final step of the Algorithm is to choose final
clustering among all. (Requires heuristics) - Use within-cluster similarity measure and
within-cluster similarities of pairwise-combined
clusters in the cluster set. Look for the highest
similarity - This technique can be best used to eliminate
clusters rather than to choose a final result
50Agglomerative Clustering
- Final step of the Algorithm is to choose final
clustering among all. (Requires heuristics) - Use previous 2 techniques to eliminate some of
the clusterings - Feed each remaining clustering to a rule
generator - The clustering with best defining rules is
chosen. - (4th tech) Bayesian Information Criterion
51Conceptual Clustering
- Create a cluster with the first instance as its
only member. - For each remaining instance, take one of two
actions at each tree level. - a. Place the new instance into an existing
cluster. - b. Create a new concept cluster having the new
instance as its only member.
52Data for Conceptual Clustering
53(No Transcript)
54Expectation Maximization
- Guess initial values for the five parameters.
- Until a termination criterion is achieved
- a. Use the probability density function for
normal distributions to compute the cluster
probability for each instance. - b. Use the probability scores assigned to each
instance in step 2(a) to re-estimate the
parameters.
55The EM Algorithm An Example
56(No Transcript)
5710.5 Heuristics or Statistics?
58Query and Visualization Techniques
- Query tools
- OLAP tools
- Visualization tools
59Machine Learning and Statistical Techniques
- Statistical techniques typically assume an
underlying distribution for the data whereas
machine learning techniques do not. - Machine learning techniques tend to have a human
flavor. - Machine learning techniques are better able to
deal with missing and noisy data. - Most machine learning techniques are able to
explain their behavior. - Statistical techniques tend to perform poorly
with large-sized data.
60Specialized Techniques
6111.1 Time-Series Analysis
- Time-series Problems Prediction applications
with one or more time-dependent attributes.
62An Example with Linear Regression
63(No Transcript)
64Linear Regression Equations for the Stock Index
Dataset
65(No Transcript)
66A Neural Network Example
67(No Transcript)
68Categorical Attribute Prediction
69(No Transcript)
70General Considerations
- Test and modify created models as new data
becomes available. - Try one or more data transformations if less
than optimal results are obtained. - Exercise caution when predicting future outcome
with training data having several predicted
fields. - Try a nonlinear model if a linear model offers
poor results. - Use unsupervised clustering to determine if
input attribute values allow the output
attribute to cluster into meaningful categories.
7111.2 Mining the Web
72Web-Based Mining General Issues
- Clickstreams
- Extended Common Log File Format
- Session Files
- User Sessions
- Pageviews
- Cookies
73(No Transcript)
74Data Mining for Web Site Evaluation
- Sequence miners are special data mining programs
able to discover frequently accessed Web pages
that occur in the same order.
75Data Mining for Personalization
76(No Transcript)
77(No Transcript)
78Data Mining for Web Site Adaptation
- The index synthesis problem Given a Web site and
a visitor access log, create new index pages
containing collections of links to related but
currently unlinked pages.
7911.3 Mining Textual Data
- Train Create an attribute dictionary.
- Filter Remove common words.
- Classify Classify new documents.
8011.4 Improving Performance
- Bagging
- Boosting
- Instance Typicality
81(No Transcript)
82(No Transcript)
83Part IV
84Rule-Based Systems
8512.1 Exploring Artificial Intelligence
86(No Transcript)
87Nearest Neighbor Heuristic
- When conducting a state-space search, always move
to the next closest state.
88(No Transcript)
89(No Transcript)
90The Water Jug Problem
91(No Transcript)
92(No Transcript)
93Depth-First SearchA-B-E-F-C-G-I-J-H-D
94Breadth-First SearchA-B-C-D-E-F-G-H-I-J
95(No Transcript)
96(No Transcript)
97(No Transcript)
98Backward Chaining
99(No Transcript)
100Expert Systems
101(No Transcript)
102Developing an Expert System
103(No Transcript)
104Structuring A Rule-Based System
105(No Transcript)
106(No Transcript)
107Choosing a Data Mining Technique
108(No Transcript)
109Managing Uncertaintyin Rule-Based Systems
11013.1 Uncertainty Sources and Solutions
111Sources of Uncertainty
- Rule 1Large Package Rule
- IF package size is large
- THEN send package UPS
112Sources of Uncertainty
- Rule Antecedent
- Rule Confidence
- Combining Uncertain Information
113General Methods for Dealing with Uncertainty
- Probability-Based Methods
- Heuristic Methods
114Probability-Based Methods
- Objective Probability
- Experimental Probability
- Subjective Probability
115Heuristic Methods
- Certainty Factors
- Fuzzy Logic
11613.2 Fuzzy Rule-Based Systems
117Fuzzy Sets
- A set associated with a linguistic value that
gives the degree of membership for a numerical
value.
118(No Transcript)
119Fuzzy Reasoning An Example
- Fuzzification
- Rule Inference
- Rule Composition
- Defuzzification
120(No Transcript)
121(No Transcript)
122(No Transcript)
12313.3 A Probability-Based Approach to Uncertainty
124Bayes Theorem
125(No Transcript)
126Multiple Evidence with Bayes Theorem
127Multiple Evidence with Bayes Theorem
128Likelihood RatiosNecessity and Sufficiency
129General Considerations
- P(HE) P(HE) must sum to 1.
- Conditional independence between multiple
pieces of evidence must be assumed. - Prior Probabilities are often unobtainable.
- Large amounts of data must be gathered to obtain
reasonable estimates for conditional
probabilities.
130Intelligent Agents
13114.1 Characteristics of Intelligent Agents
- Situatedness
- Autonomy
- Adaptivity
- Sociability
13214.2 Types of Agents
- Anticipatory agents
- Filtering agents
- Semiautonomous agents
- Find-and-retrieve agents
- User agents
- Monitor and Surveillance agents
- Data Mining agents
- Proactive agents
- Cooperative agents
13314.3 Integrating Data Mining, Expert Systems and
Intelligent Agents
134(No Transcript)
135The iDA Software
136Datasets for Data Mining
137Decision Tree Attribute Selection
138Computing Gain Ratio
139Computing Gain(A)
140Computing Info(I)
141Computing Info(I,A)
142Computing Split Info(A)
143(No Transcript)
144(No Transcript)
145Statistics for Performance Evaluation
146D.1 Single-Valued Summary Statistics
147Computing the Mean
where µ is the mean value n is the number of data
items xi is the ith data item
148Computing the Variance
where s2 is the variance µ is the population
mean n is the number of data items xi is the ith
data item
149D.2 The Normal Distribution
150The Normal Curve
where f(x) is the height of the curve
corresponding to values of x e is the base of
natural logarithms approximated by 2.718282 m is
the arithmetic mean for the data s is the
standard deviation
151D.3 Comparing Supervised Learner Models
- Comparing Models with Independent Test Data
- Pairwise Comparison with a Single Test Set
152Comparing Models with Independent Test Data
Two independent test sets, set A containing n1
elements and set B with n elements Error rate E1
and variance v1 for model M1 on test set A Error
rate E2 and variance v2 for model M2 on test set B
153Pairwise Comparison with a Single Test Set
154Computing Joint Variance for a Single Test Set
155Pairwise Comparison with a Single Test Set
156D.4 Confidence Intervals for Numeric Output
157D.5 Comparing Models with Numeric Output
- Independent Test Sets
- Pairwise Comparison with a Single Test Set
- Overall Comparison with a Single Test Set
158Comparing Models with Independent Test Sets
where mae1 is the mean absolute error for model
M1 mae2 is the mean absolute error for model
M2 v1 and v2 are variance scores associated with
M1 and M2 n1 and n2 are the number of instances
within each respective test set
159Pairwise Comparison with a Single Test Set
where mae1 is the mean absolute error for
model M1 mae2 is the mean absolute error for
model M2 V12 is the joint variance computed
with the formula defined in Equation D.5 n is
the number of test set instances
160Overall Comparison with a Single Test Set
where maej is the mean absolute error for model
j ei is the absolute value of the computed value
minus the actual value for instance i n is the
number of test set instances
161Overall Comparison with a Single Test Set
where ? is either the average or the larger of
the variance scores for each model n is the
total number of test set instances
162Excel Pivot Tables Office 97
163(No Transcript)
164(No Transcript)
165(No Transcript)
166(No Transcript)
167(No Transcript)
168(No Transcript)
169(No Transcript)
170(No Transcript)
171(No Transcript)