CS 478 Machine Learning - PowerPoint PPT Presentation

1 / 15

About This Presentation

Title:

CS 478 Machine Learning

Description:

Let S be a set examples from c classes. Where pi is the proportion of examples of S belonging ... Intuitively, the smaller the entropy, the purer the partition ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 16

Provided by: mauc3

Category:

Tags: learning | machine | purer

Transcript and Presenter's Notes

Title: CS 478 Machine Learning

1
CS 478 - Machine Learning

Decision Trees (II)

2
Entropy (I)

Let S be a set examples from c classes

Where pi is the proportion of examples of S
belonging to class i. (Note, we define 0log00)

3
Entropy (II)

Intuitively, the smaller the entropy, the purer
the partition
Based on Shannons information theory (c2)
If p11 (resp. p21), then receiver knows example
is positive (resp. negative). No message need be
sent.
If p1p20.5, then receiver needs to be told the
class of the example. 1-bit message must be sent.
If 0ltp1lt1, then receiver needs a less than 1 bit
on average to know the class of the example.

4
Information Gain

Let p be a property with n outcomes
The information gained by partitioning a set S
according to p is

Where Si is the subset of S for which property p
has its ith value

5
Play Tennis
What is the ID3 induced tree?
6
ID3s Splitting Criterion

The objective of ID3 at each split is to increase
information gain, or equivalently, to lower
entropy. It does so as much as possible
Pros Easy to do
Cons May lead to overfitting

7
Overfitting

Given a hypothesis space H, a hypothesis h?H is
said to overfit the training data if there exists
some alternative hypothesis h ?H, such that h
has smaller error than h over the training
examples, but h has smaller error than h over
the entire distribution of instances

8
Avoiding Overfitting

Two alternatives
Stop growing the tree, before it begins to
overfit (e.g., when data split is not
statistically significant)
Grow the tree to full (overfitting) size and
post-prune it
Either way, when do I stop? What is the correct
final tree size?

9
Approaches

Use only training data and a statistical test to
estimate whether expanding/pruning is likely to
produce an improvement beyond the training set
Use MDL to minimize size(tree)
size(misclassifications(tree))
Use a separate validation set to evaluate utility
of pruning
Use richer node conditions and accuracy

10
Reduced Error Pruning

Split dataset into training and validation sets
Induce a full tree from the training set
While the accuracy on the validation set
increases
Evaluate the impact of pruning each subtree,
replacing its root by a leaf labeled with the
majority class for that subtree
Remove the subtree that most increases validation
set accuracy (greedy approach)

11
Rule Post-pruning

Split dataset into training and validation sets
Induce a full tree from the training set
Convert the tree into an equivalent set of rules
For each rule
Remove any preconditions that result in increased
rule accuracy on the validation set
Sort the rules by estimated accuracy
Classify new examples using the new ordered set
of rules

12
Discussion

Reduced-error pruning produces the smallest
version of the most accurate subtree
Rule post-pruning is more fine-grained and
possibly the most used method
In all cases, pruning based on a validation set
is problematic when the amount of available data
is limited

13
Accuracy vs Entropy

ID3 uses entropy to build the tree and accuracy
to prune it
Why not use accuracy in the first place?
How?
How does it compare with entropy?
Is there a way to make it work?

14
Other Issues

The text briefly discusses the following aspects
of decision tree learning
Continuous-valued attributes
Alternative splitting criteria (e.g., for
attributes with many values)
Accounting for costs

15
Unknown Attribute Values

Alternatives
Remove examples with missing attribute values
Treat missing value as a distinct, special value
of the attribute
Replace missing value with most common value of
the attribute
Overall
At node n
At node n with same class label
Use probabilities

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

CS 478 Machine Learning PowerPoint PPT Presentation

CS 478 Machine Learning - Nosocomial Infection Detection (I) Nosocomial infections are estimated to affect 6-12% of hospitalised patients ... cluster of nosocomial colonisation/infection. ... | PowerPoint PPT presentation | free to view

ACC 422 NERD Successful Learning / acc422nerd.com PowerPoint PPT Presentation

ACC 422 NERD Successful Learning / acc422nerd.com - FOR MORE CLASSES VISIT www.acc422nerd.com This Tutorial contains excel File which can be used to solve for any change in values Brief Exercise 7-1 Brief Exercise 7-7 Brief Exercise 7-14 Brief Exercise 7-15 Brief Exercise | PowerPoint PPT presentation | free to view

CS 478 Machine Learning PowerPoint PPT Presentation

CS 478 Machine Learning - The Plague of Linear Separability. The good news is: ... The result on linear separability (Minsky & Papert, 1969) virtually put an end ... | PowerPoint PPT presentation | free to view

CS 478 Machine Learning PowerPoint PPT Presentation

CS 478 Machine Learning - ... people actually told the truth and liar in only 87% of the cases where people ... Suppose a new person is asked about X and the lie detector returns liar ... | PowerPoint PPT presentation | free to view

ACC 422 NERD Principal Education / acc422nerd.com PowerPoint PPT Presentation

ACC 422 NERD Principal Education / acc422nerd.com - ACC 422 NERD course is a grassroots open education project with a model for lifelong learning. | PowerPoint PPT presentation | free to view

Introduction to Neural Networks and Machine Learning CS 478 Professor Tony Martinez PowerPoint PPT Presentation

Introduction to Neural Networks and Machine Learning CS 478 Professor Tony Martinez - Machines and Computation. Deterministic mappings. What Features would we like for such ... Voting Records Data Base. democrat,n,y,y,n,y,y,n,n,n,n,n,n,y,y,y,y ... | PowerPoint PPT presentation | free to view

Computational and Statistical Learning Group CASTLE PowerPoint PPT Presentation

Computational and Statistical Learning Group CASTLE - Alex Rojas (MS 2001, now at CMU) Adriana Lopez (MS 2002-now at U. Pittsburgh) ... Caroline Rodriguez (PR): Data preprocessing-Visualization ... | PowerPoint PPT presentation | free to view

ACC 422 NERD Redefine the Possible / acc422nerd.com PowerPoint PPT Presentation

ACC 422 NERD Redefine the Possible / acc422nerd.com - ACC 422 NERD course is a grassroots open education project with a model for lifelong learning. | PowerPoint PPT presentation | free to view

Liquid Analytical Instrument Market by Product Type, Distribution Channel, End User 2024-2032 PowerPoint PPT Presentation

Liquid Analytical Instrument Market by Product Type, Distribution Channel, End User 2024-2032 - According to the latest research report by IMARC Group, The global liquid analytical instrument market size reached USD 478.5 Million in 2023. Looking forward, IMARC Group expects the market to reach USD 747.8 Million by 2032, exhibiting a growth rate (CAGR) of 4.9% during 2024-2032. More Info:- https://www.imarcgroup.com/liquid-analytical-instrument-market | PowerPoint PPT presentation | free to view

CS 478 Machine Learning PowerPoint PPT Presentation

CS 478 Machine Learning - Individuals survive based on their ability to adapt to the pressures of their ... Fitter individuals tend to have more offspring, thus driving the population as a ... | PowerPoint PPT presentation | free to view

CS 478 Tools for Machine Learning and Data Mining PowerPoint PPT Presentation

CS 478 Tools for Machine Learning and Data Mining - Learn-Perceptron is guaranteed to converge to a correct assignment of weights if ... Implements gradient descent (i.e., steepest) on the error surface: ... | PowerPoint PPT presentation | free to view

Detecting Malicious Executables PowerPoint PPT Presentation

Detecting Malicious Executables - Extract important features from the assembly program. Combine with machine-code features ... Features are extracted from the byte codes in the form of n-grams, ... | PowerPoint PPT presentation | free to view

Ensembles PowerPoint PPT Presentation

Ensembles - If all n were the same model, then no advantage could be gained. ... Different initial parameters, sampling approaches, etc. Different learning algorithms ... | PowerPoint PPT presentation | free to view

Introduction to Web Browsers and Basic Search Strategies Using Search Engines PowerPoint PPT Presentation

Introduction to Web Browsers and Basic Search Strategies Using Search Engines - Introduction to Web Browsers and Basic Search Strategies Using Search Engines ... GameSpot (http://www.gamespot.com/) EDUC 478. Davina Pruitt-Mentle. 57 ... | PowerPoint PPT presentation | free to view

Bayesian Learning PowerPoint PPT Presentation

Bayesian Learning - ... to discover the best h given a particular D, P(D) is the same ... a posteriori (MAP) ... the prior probabilities over a hypothesis space is an inductive ... | PowerPoint PPT presentation | free to view

Yes, This Room is Too White: Understanding Why Race Matters in the Charter Movement PowerPoint PPT Presentation

Yes, This Room is Too White: Understanding Why Race Matters in the Charter Movement - Director, Kirwan Institute for the Study of Race and Ethnicity. Williams Chair in Civil Rights & Civil Liberties, ... Bethel School Dist. No. 403 v. Fraser, 478 ... | PowerPoint PPT presentation | free to view

About Loyaltyonblockchain PowerPoint PPT Presentation

About Loyaltyonblockchain - SwayLoyalty is a blockchain loyalty platform helps businesses gain loyalty of existing customers and helps aquire new customers through sponsors of KOLs influencers. | PowerPoint PPT presentation | free to view

OFF PUMP CORONARY ARTERY BYPASS AT UNIVERSITAS HOSPITAL PowerPoint PPT Presentation

OFF PUMP CORONARY ARTERY BYPASS AT UNIVERSITAS HOSPITAL - Diabetes Increases Risk of Coronary Plaque Disruption and Thrombosis ... Diabetes increases plaque formation with the development of a vulnerable plaque prone to ... | PowerPoint PPT presentation | free to view

Do What You Love PowerPoint PPT Presentation

Do What You Love - Why You Should Go To Graduate School in Computer Science | PowerPoint PPT presentation | free to view

New Holland TC27D Compact Tractor Operator’s Manual Instant Download (Publication No.86607048) PowerPoint PPT Presentation

New Holland TC27D Compact Tractor Operator’s Manual Instant Download (Publication No.86607048) - New Holland TC27D Compact Tractor Operator’s Manual Instant Download (Publication No.86607048) | PowerPoint PPT presentation | free to view

Knowledge Acquisition and Problem Solving PowerPoint PPT Presentation

Knowledge Acquisition and Problem Solving - CS 7850 Fall 2004 Gheorghe Tecuci tecuci@gmu.edu http://lac.gmu.edu/ Learning Agents Center and Computer Science Department George Mason University | PowerPoint PPT presentation | free to view

New Holland TC27D Compact Tractor Operator’s Manual Instant Download (Publication No.86607048) PowerPoint PPT Presentation

New Holland TC27D Compact Tractor Operator’s Manual Instant Download (Publication No.86607048) - New Holland TC27D Compact Tractor Operator’s Manual Instant Download (Publication No.86607048) | PowerPoint PPT presentation | free to view

New Holland TC27DA Tractors Operator’s Manual Instant Download (Publication No.87304222) PowerPoint PPT Presentation

New Holland TC27DA Tractors Operator’s Manual Instant Download (Publication No.87304222) - New Holland TC27DA Tractors Operator’s Manual Instant Download (Publication No.87304222) | PowerPoint PPT presentation | free to view

New Holland TC27DA Tractors Operator’s Manual Instant Download (Publication No.87304222) PowerPoint PPT Presentation

New Holland TC27DA Tractors Operator’s Manual Instant Download (Publication No.87304222) - New Holland TC27DA Tractors Operator’s Manual Instant Download (Publication No.87304222) | PowerPoint PPT presentation | free to view

New Holland TC27D Compact Tractor Operator’s Manual Instant Download (Publication No.86607048) PowerPoint PPT Presentation

New Holland TC27D Compact Tractor Operator’s Manual Instant Download (Publication No.86607048) - New Holland TC27D Compact Tractor Operator’s Manual Instant Download (Publication No.86607048) | PowerPoint PPT presentation | free to view

New Holland TC27D Compact Tractor Operator’s Manual Instant Download (Publication No.86607048) PowerPoint PPT Presentation

New Holland TC27D Compact Tractor Operator’s Manual Instant Download (Publication No.86607048) - New Holland TC27D Compact Tractor Operator’s Manual Instant Download (Publication No.86607048) | PowerPoint PPT presentation | free to view

New Holland TC27DA Tractors Operator’s Manual Instant Download (Publication No.87304222) PowerPoint PPT Presentation

New Holland TC27DA Tractors Operator’s Manual Instant Download (Publication No.87304222) - New Holland TC27DA Tractors Operator’s Manual Instant Download (Publication No.87304222) | PowerPoint PPT presentation | free to view