Data Mining with WEKA - PowerPoint PPT Presentation

1 / 165
About This Presentation
Title:

Data Mining with WEKA

Description:

An evaluation method: correlation-based, wrapper, information gain, chi-squared, ... Java-Beans-based interface for setting up and running machine learning experiments ... – PowerPoint PPT presentation

Number of Views:917
Avg rating:3.0/5.0
Slides: 166
Provided by: aiDg
Category:
Tags: weka | data | mining

less

Transcript and Presenter's Notes

Title: Data Mining with WEKA


1
Data Mining with WEKA
2
WEKA
  • Machine learning/data mining software written in
    Java
  • Used for research, education, and applications
  • Complements Data Mining by Witten Frank
  • Main features
  • Comprehensive set of data pre-processing tools,
    learning algorithms and evaluation methods
  • Graphical user interfaces (incl. data
    visualization)
  • Environment for comparing learning algorithms

3
Data Files
  • _at_relation heart-disease-simplified
  • _at_attribute age numeric
  • _at_attribute sex female, male
  • _at_attribute chest_pain_type typ_angina, asympt,
    non_anginal, atyp_angina
  • _at_attribute cholesterol numeric
  • _at_attribute exercise_induced_angina no, yes
  • _at_attribute class present, not_present
  • _at_data
  • 63,male,typ_angina,233,no,not_present
  • 67,male,asympt,286,yes,present
  • 67,male,asympt,229,yes,present
  • 38,female,non_anginal,?,no,not_present
  • ...

numeric attribute
nominal attribute
Flat file in ARFF format
4
(No Transcript)
5
Explorer pre-processing
  • Source
  • Data can be imported from a file in various
    formats ARFF, CSV, C4.5, binary
  • Data can also be read from a URL or from an SQL
    database (using JDBC)
  • Pre-processing tools
  • Called filters
  • Discretization, normalization, resampling,
    attribute selection, transforming and combining
    attributes,

6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22

23

24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
Explorer building classifiers
  • Classifiers in WEKA are models for predicting
    nominal or numeric quantities
  • Implemented learning schemes include
  • Decision trees and lists, instance-based
    classifiers, support vector machines, multi-layer
    perceptrons, logistic regression, Bayes nets,
  • Meta-classifiers include
  • Bagging, boosting, stacking, error-correcting
    output codes, locally weighted learning,

28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49

50

51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54

55

56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
(No Transcript)
68
(No Transcript)
69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
(No Transcript)
73
(No Transcript)
74
(No Transcript)
75
(No Transcript)
76
(No Transcript)
77
(No Transcript)
78
(No Transcript)
79
(No Transcript)
80
(No Transcript)
81
(No Transcript)
82
(No Transcript)
83
(No Transcript)
84
(No Transcript)
85
(No Transcript)
86
(No Transcript)
87
Explorer clustering data
  • WEKA contains clusterers for finding groups of
    similar instances in a dataset
  • Implemented schemes are
  • k-Means, EM, Cobweb, X-means, FarthestFirst
  • Clusters can be visualized and compared to true
    clusters (if given)
  • Evaluation based on loglikelihood if clustering
    scheme produces a probability distribution

88
(No Transcript)
89
(No Transcript)
90
(No Transcript)
91
(No Transcript)
92
(No Transcript)
93
(No Transcript)
94
(No Transcript)
95
(No Transcript)
96
(No Transcript)
97
(No Transcript)
98
(No Transcript)
99
(No Transcript)
100
(No Transcript)
101
(No Transcript)
102
(No Transcript)
103
Explorer finding associations
  • WEKA contains an implementation of the Apriori
    algorithm for learning association rules
  • Works only with discrete data
  • Can identify statistical dependencies between
    groups of attributes
  • milk, butter ? bread, eggs (with confidence 0.9
    and support 2000)
  • Apriori can compute all rules that have a given
    minimum support and exceed a given confidence

104
(No Transcript)
105
(No Transcript)
106
(No Transcript)
107
(No Transcript)
108
(No Transcript)
109
(No Transcript)
110
(No Transcript)
111
Explorer attribute selection
  • Panel that can be used to investigate which
    (subsets of) attributes are the most predictive
    ones
  • Attribute selection methods contain two parts
  • A search method best-first, forward selection,
    random, exhaustive, genetic algorithm, ranking
  • An evaluation method correlation-based, wrapper,
    information gain, chi-squared,
  • Very flexible WEKA allows (almost) arbitrary
    combinations of these two

112
(No Transcript)
113
(No Transcript)
114
(No Transcript)
115
(No Transcript)
116
(No Transcript)
117
(No Transcript)
118
(No Transcript)
119
(No Transcript)
120
Explorer data visualization
  • Visualization very useful in practice e.g. helps
    to determine difficulty of the learning problem
  • WEKA can visualize single attributes (1-d) and
    pairs of attributes (2-d)
  • To do rotating 3-d visualizations (Xgobi-style)
  • Color-coded class values
  • Jitter option to deal with nominal attributes
    (and to detect hidden data points)
  • Zoom-in function

121
(No Transcript)
122
(No Transcript)
123
(No Transcript)
124
(No Transcript)
125
(No Transcript)
126
(No Transcript)
127
(No Transcript)
128
(No Transcript)
129
(No Transcript)
130
(No Transcript)
131
(No Transcript)
132
Performing experiments
  • Experimenter makes it easy to compare the
    performance of different learning schemes
  • For classification and regression problems
  • Results can be written into file or database
  • Evaluation options cross-validation, learning
    curve, hold-out
  • Can also iterate over different parameter
    settings
  • Significance-testing built in!

133
(No Transcript)
134
(No Transcript)
135
(No Transcript)
136
(No Transcript)
137
(No Transcript)
138
(No Transcript)
139
(No Transcript)
140
(No Transcript)
141
(No Transcript)
142
(No Transcript)
143
(No Transcript)
144
(No Transcript)
145
The Knowledge Flow GUI
  • New graphical user interface for WEKA
  • Java-Beans-based interface for setting up and
    running machine learning experiments
  • Data sources, classifiers, etc. are beans and can
    be connected graphically
  • Data flows through components e.g.,
  • data source -gt filter -gt classifier -gt
    evaluator
  • Layouts can be saved and loaded again later

146
(No Transcript)
147
(No Transcript)
148
(No Transcript)
149
(No Transcript)
150
(No Transcript)
151
(No Transcript)
152
(No Transcript)
153
(No Transcript)
154
(No Transcript)
155
Can continue this...
156
(No Transcript)
157
(No Transcript)
158
(No Transcript)
159
(No Transcript)
160
(No Transcript)
161
(No Transcript)
162
(No Transcript)
163
(No Transcript)
164
(No Transcript)
165
Conclusion try it yourself!
  • WEKA is available at
  • http//www.cs.waikato.ac.nz/ml/weka
  • Also has a list of projects based on WEKA
  • WEKA contributors
  • Abdelaziz Mahoui, Alexander K. Seewald, Ashraf
    M. Kibriya, Bernhard Pfahringer , Brent Martin,
    Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian
    H. Witten , J. Lindgren, Janice Boughton, Jason
    Wells, Len Trigg, Lucio de Souza Coelho, Malcolm
    Ware, Mark Hall ,Remco Bouckaert , Richard
    Kirkby, Shane Butler, Shane Legg, Stuart Inglis,
    Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang,
    Zhihai Wang
Write a Comment
User Comments (0)
About PowerShow.com