Weka Just do it - PowerPoint PPT Presentation

About This Presentation
Title:

Weka Just do it

Description:

Classification: given examples labelled from a finite domain, generate a ... Incorrectly Classified Instances 7 4.67 % Default 10-fold cross validation i.e. ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 11
Provided by: ics5
Category:

less

Transcript and Presenter's Notes

Title: Weka Just do it


1
WekaJust do it
  • Free and Open Source
  • ML Suite
  • Ian Witten Eibe Frank
  • University of Waikato
  • New Zealand

2
Overview
  • Classifiers, Regressors, and clusterers
  • Multiple evaluation schemes
  • Bagging and Boosting
  • Feature Selection
  • right features and data key to successful
    learning
  • Experimenter
  • Visualizer
  • Text not up to date.
  • They welcome additions.

3
Learning Tasks
  • Classification given examples labelled from a
    finite domain, generate a procedure for labelling
    unseen examples.
  • Regression given examples labelled with a real
    value, generate procedure for labelling unseen
    examples.
  • Clustering from a set of examples, partitioning
    examples into interesting groups. What
    scientists want.

4
Data Format IRIS
  • _at_RELATION iris
  • _at_ATTRIBUTE sepallength REAL
  • _at_ATTRIBUTE sepalwidth REAL
  • _at_ATTRIBUTE petallength REAL
  • _at_ATTRIBUTE petalwidth REAL
  • _at_ATTRIBUTE class Iris-setosa,Iris-versicolor,Iri
    s-virginica
  • _at_DATA
  • 5.1,3.5,1.4,0.2,Iris-setosa
  • 4.9,3.0,1.4,0.2,Iris-setosa
  • 4.7,3.2,1.3,0.2,Iris-setosa
  • Etc.
  • General from
  • _at_atttribute attribute-name REAL or list of
    values

5
J48 Decision Tree
  • petalwidth lt 0.6 Iris-setosa (50.0) under
    node
  • petalwidth gt 0.6
    ..number wrong
  • petalwidth lt 1.7
  • petallength lt 4.9 Iris-versicolor
    (48.0/1.0)
  • petallength gt 4.9
  • petalwidth lt 1.5 Iris-virginica
    (3.0)
  • petalwidth gt 1.5 Iris-versicolor
    (3.0/1.0)
  • petalwidth gt 1.7 Iris-virginica (46.0/1.0)

6
Cross-validation
  • Correctly Classified Instances 143 95.3
  • Incorrectly Classified Instances 7 4.67
  • Default 10-fold cross validation i.e.
  • Split data into 10 equal sized pieces
  • Train on 9 pieces and test on remainder
  • Do for all possibilities and average

7
J48 Confusion Matrix
  • Old data set from statistics 50 of each class
  • a b c lt-- classified as
  • 49 1 0 a Iris-setosa
  • 0 47 3 b Iris-versicolor
  • 0 3 47 c Iris-virginica

8
Precision, Recall, and Accuracy
  • Precision probability of being correct given
    that your decision.
  • Precision of iris-setosa is 49/49 100
  • Specificity in medical literature
  • Recall probability of correctly identifying
    class.
  • Recall accuracy for iris-setosa is 49/50 98
  • Sensitity in medical literature
  • Accuracy right/total 143/150 95

9
Other Evaluation Schemes
  • Leave-one-out cross-validation
  • Cross-validation where n number of training
    instanced
  • Specific train and test set
  • Allows for exact replication
  • Ok if train/test large, e.g. 10,000 range.

10
Bootstrap sampling
  • Randomly select n with replacement from n
  • Expect about 2/3 to be chosen for training
  • Prob of not chosen (1-1/n)n 1/e.
  • Testing on remainder
  • Repeat about 30 times and average.
  • Avoids partition bias
Write a Comment
User Comments (0)
About PowerShow.com