Introduction to Weka - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Weka

Description:

An evaluation method: correlation-based, wrapper, information gain, chi-squared, ... based classifiers, support vector machines, multi-layer perceptrons, logistic ... – PowerPoint PPT presentation

Number of Views:143
Avg rating:3.0/5.0
Slides: 64
Provided by: pan72
Learn more at: https://www.cse.fau.edu
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Weka


1
Introduction to Weka
  • Xingquan (Hill) Zhu
  • Slides copied from Jeffrey Junfeng Pan (UST)

2
Outline
  • Weka
  • Data Source
  • Feature selection
  • Model building
  • Classifier / Cross Validation
  • Result visualization

3
WEKA
  • http//www.cs.waikato.ac.nz/ml/weka/
  • Data mining software in Java
  • Open source software
  • UCI Data Repository
  • http//www.ics.uci.edu/mlearn/MLRepository.html

4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
Explorer pre-processing the data
  • Data can be imported from a file in various
    formats ARFF, CSV, C4.5, binary
  • Data can also be read from a URL or from an SQL
    database (using JDBC)
  • Pre-processing tools in WEKA are called filters
  • WEKA contains filters for
  • Discretization, normalization, resampling,
    attribute selection, transforming and combining
    attributes,

8
WEKA only deals with flat files
  • _at_relation heart-disease-simplified
  • _at_attribute age numeric
  • _at_attribute sex female, male
  • _at_attribute chest_pain_type typ_angina, asympt,
    non_anginal, atyp_angina
  • _at_attribute cholesterol numeric
  • _at_attribute exercise_induced_angina no, yes
  • _at_attribute class present, not_present
  • _at_data
  • 63,male,typ_angina,233,no,not_present
  • 67,male,asympt,286,yes,present
  • 67,male,asympt,229,yes,present
  • 38,female,non_anginal,?,no,not_present
  • ...

Flat file in ARFF format
9
WEKA only deals with flat files
  • _at_relation heart-disease-simplified
  • _at_attribute age numeric
  • _at_attribute sex female, male
  • _at_attribute chest_pain_type typ_angina, asympt,
    non_anginal, atyp_angina
  • _at_attribute cholesterol numeric
  • _at_attribute exercise_induced_angina no, yes
  • _at_attribute class present, not_present
  • _at_data
  • 63,male,typ_angina,233,no,not_present
  • 67,male,asympt,286,yes,present
  • 67,male,asympt,229,yes,present
  • 38,female,non_anginal,?,no,not_present
  • ...

numeric attribute
nominal attribute
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26

27

28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
Explorer attribute selection
  • Panel that can be used to investigate which
    (subsets of) attributes are the most predictive
    ones
  • Attribute selection methods contain two parts
  • A search method best-first, forward selection,
    random, exhaustive, genetic algorithm, ranking
  • An evaluation method correlation-based, wrapper,
    information gain, chi-squared,
  • Very flexible WEKA allows (almost) arbitrary
    combinations of these two

32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
Explorer building classifiers
  • Classifiers in WEKA are models for predicting
    nominal or numeric quantities
  • Implemented learning schemes include
  • Decision trees and lists, instance-based
    classifiers, support vector machines, multi-layer
    perceptrons, logistic regression, Bayes nets,
  • Meta-classifiers include
  • Bagging, boosting, stacking, error-correcting
    output codes, locally weighted learning,

41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
Problem with Running Weka
Problem Out of memory for large data set
Solution java -Xmx1000m -jar weka.jar
63
Outline
  • Weka
  • Data Source
  • Feature selection
  • Model building
  • Classifier / Cross Validation
  • Result visualization
Write a Comment
User Comments (0)
About PowerShow.com