MIS510 - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

MIS510

Description:

MIS510 – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 34
Provided by: gavin
Category:
Tags: and | bath | bed | beyond | mis510

less

Transcript and Presenter's Notes

Title: MIS510


1
Introduction to Weka and NetDraw
  • MIS510
  • Spring 2009

2
Outline
  • Weka
  • Introduction
  • Weka Tools/Functions
  • How to use Weka?
  • Weka Data File Format (Input)
  • Weka for Data Mining
  • Sample Output from Weka (Output)
  • Conclusion
  • NetDraw
  • Introduction
  • How to use NetDraw?
  • NetDraw Input Data File Format
  • Draw Networks using NetDraw
  • Conclusion

3
Weka
4
Introduction to Weka (Data Mining Tool)
  • Weka was developed at the University of Waikato
    in New Zealand. http//www.cs.waikato.ac.nz/ml/wek
    a/
  • Weka is a open source data mining tool developed
    in Java. It is used for research, education, and
    applications. It can be run on Windows, Linux and
    Mac.

5
What can Weka do?
  • Weka is a collection of machine learning
    algorithms for data mining tasks. The algorithms
    can either be applied directly to a dataset
    (using GUI) or called from your own Java code
    (using Weka Java library).
  • Weka contains tools for data pre-processing,
    classification, regression, clustering,
    association rules, and visualization. It is also
    well-suited for developing new machine learning
    schemes.

6
Weka Tools/Functions
  • Tools (or functions) in Weka include
  • Data preprocessing (e.g., Data Filters),
  • Classification (e.g., BayesNet, KNN, C4.5
    Decision Tree, Neural Networks, SVM),
  • Regression (e.g., Linear Regression, Isotonic
    Regression, SVM for Regression),
  • Clustering (e.g., Simple K-means, Expectation
    Maximization (EM)),
  • Association rules (e.g., Apriori Algorithm,
    Predictive Accuracy, Confirmation Guided),
  • Feature Selection (e.g., Cfs Subset Evaluation,
    Information Gain, Chi-squared Statistic), and
  • Visualization (e.g., View different
    two-dimensional plots of the data).

7
Wekas Role in the Big Picture
8
How to use Weka?
  • Weka Data File Format (Input)
  • Weka for Data Mining
  • Sample Output from Weka (Output)

9
Weka Data File Format (Input)
  • The most popular data input format of Weka is
    arff (with arff being the extension name of
    your input data file).
  • FILE FORMAT
  • _at_relation RELATION_NAME
  • _at_attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR
  • _at_attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR
  • _at_attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR
  • _at_attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR
  • _at_data
  • DATAROW1
  • DATAROW2
  • DATAROW3

10
Example of arff Input File
  • _at_relation heart-disease-simplified
  • _at_attribute age numeric
  • _at_attribute sex female, male
  • _at_attribute chest_pain_type typ_angina, asympt,
    non_anginal, atyp_angina
  • _at_attribute cholesterol numeric
  • _at_attribute exercise_induced_angina no, yes
  • _at_attribute class present, not_present
  • _at_data
  • 63,male,typ_angina,233,no,not_present
  • 67,male,asympt,286,yes,present
  • 67,male,asympt,229,yes,present
  • 38,female,non_anginal,?,no,not_present
  • ...

11
Weka for Data Mining
  • There are mainly 2 ways to use Weka to conduct
    your data mining tasks.
  • Use Weka Graphical User Interfaces (GUI)
  • GUI is straightforward and easy to use. But it is
    not flexible. It can not be called from you own
    application.
  • Import Weka Java library to your own java
    application.
  • Developers can leverage on Weka Java library to
    develop software or modify the source code to
    meet special requirements. It is more flexible
    and advanced. But it is not as easy to use as
    GUI.

12
Weka GUI
Different analysis tools/functions
The value set of the chosen attribute and the
of input items with each value
Different attributes to choose
13
Weka GUI
Classification Algorithms
14
Import Weka Java library to your own Java
application
  • Three sets of classes you may need to use when
    developing your own application
  • Classes for Loading Data
  • Classes for Classifiers
  • Classes for Evaluation

15
Classes for Loading Data
  • Related Weka classes
  • weka.core.Instances
  • weka.core.Instance
  • weka.core.Attribute
  • How to load input data file into instances?
  • Every DataRow - Instance, Every Attribute -
    Attribute, Whole - Instances

Load a file as Instances FileReader reader re
ader new FileReader(path) Instances instances
new Instances(reader)
16
Classes for Loading Data
  • Instances contains Attribute and Instance
  • How to get every Instance within the Instances?
  • How to get an Attribute?

Get Instance Instance instance instances.inst
ance(index) Get Instance Count int count in
stances.numInstances()
Get Attribute Name Attribute attribute insta
nces.attribute(index) Get Attribute Count int
count instances.numAttributes()
17
Classes for Loading Data
  • How to get the Attribute value of each
    Instance?
  • Class Index (Very important!)

Get value instance.value(index) or ins
tance.value(attrName)
Get Class Index instances.classIndex()
or instances.classAttribute().index()
Set Class Index instances.setClass(attribute)
or instances.setClassIndex(index)
18
Classes for Classifiers
  • Weka classes for C4.5, Naïve Bayes, and SVM
  • Classifier all classes which extend
    weka.classifiers.Classifier
  • C4.5 weka.classifier.trees.J48
  • NaiveBayes weka.classifiers.bayes.NaiveBayes
  • SVM weka.classifiers.functions.SMO
  • How to build a classifier?

Build a C4.5 Classifier Classifier c new wek
a.classifier.trees.J48() c.buildClassifier(train
ingInstances) Build a SVM Classifier Classifie
r e weka.classifiers.functions.SMO()
e.buildClassifier(trainingInstances)
19
Classes for Evaluation
  • Related Weka classes
  • weka.classifiers.CostMatrix
  • weka.classifiers.Evaluation
  • How to use the evaluation classes?

Use Classifier To Do Classification
CostMatrix costMatrix null Evaluation eval n
ew Evaluation(testingInstances, costMatrix)
for (int i 0 i s() i) eval.evaluateModelOnceAndRecordPredict
ion(c,testingInstances.instance(i))
System.out.println(eval.toSummaryString(false))
System.out.println(eval.toClassDetailsString())
System.out.println(eval.toMatrixString())
20
Classes for Evaluation
  • Cross Validation
  • In cross validation process, we split a single
    dataset into N equal shares. While taking N-1
    shares as a training dataset, the rest will be
    used as testing dataset.
  • The most widely used is 10 cross fold validation.

21
Classes for Evaluation
  • How to obtain the training dataset and the
    testing dataset?

Random random new Random(seed)
instances.randomize(random) instances.stratify(N
) for (int i 0 i rain instances.trainCV(N, i , random)
Instances test instances.testCV(N, i ,
random)
22
Sample Output from Weka
23
Conclusion about Weka
  • In sum, the overall goal of Weka is to build a
    state-of-the-art facility for developing machine
    learning (ML) techniques and allow people to
    apply them to real-world data mining problems.
  • Detailed documentation about different functions
    provided by Weka can be found on Weka website.
  • WEKA is available at
  • http//www.cs.waikato.ac.nz/ml/weka

24
NetDraw
25
Introduction to NetDraw (Visualization Tool)
  • NetDraw is an open source program written by
    Steve Borgatti from Analytic Technologies. It is
    often used for visualizing both 1-mode and 2-mode
    social network data.
  • You can download it from
  • http//www.analytictech.com/downloadnd.htm
  • (Compared to Weka, it is much easier to use P)

26
What can NetDraw do?
  • NetDraw can
  • handle multiple relations at the same time, and
  • use node attributes to set colors, shapes, and
    sizes of nodes.
  • Pictures can be saved in metafile, jpg, gif and
    bitmap formats.
  • Two basic kinds of layouts are implemented a
    circle and an MDS based on geodesic distance.
  • You can also rotate, flip, shift, resize and zoom
    configurations.

27
How to use NetDraw?
  • NetDraw Input Data File Format
  • Draw Networks using NetDraw

28
NetDraw Input Data File Format
vna Data Format The VNA data format (with vna
being the extension name of the input data file)
allows users to store not only network data but
also attributes of the nodes, along with
information about how to display them (color,
size, etc.).
node data "ID", num "10 Gift Card off REGIS SA
LON (SALON SERVICES) E" 2 "10 iTunes Gift Cert
ificate exp 9/2008" 2 "10 STARBUCKS gift CARD CE
RTIFICATE" 3 "10 Target Gift Card" 3 "10.00 iT
unes Music Gift Card - Free Shipping" 2
"100 Best Buy Gift Card" 15 "100 Gap Gift Card
- FREE Shipping" 9 Tie data FR
OM TO "Strength" "Home Depot Gift Card 500." "1
00 Home Depot Gift Card Accepted Nationwide" 1
" 250 Best Buy GiftCard Gift Card Gift
Certifica" "25 Best Buy Gift Card for Store or
Online!" 1 "50 Bed Bath Beyond Gift Card - FRE
E SHIPPING!" "200 Cost Plus World Market Gift
Card 4 Jewelry Be" 1 "500.00 Best Buy gift certi
ficate" "15 Best Buy Gift Card Free
Shipping" 1 "25 Best Buy Gift Card for Store or
Online!" "15 Best Buy Gift Card Free
Shipping" 1 "Bath and Body Works 25 Gift Card"
"200 Cost Plus World Market Gift Card 4 Jewelry
Be" 1
29
Draw Networks using NetDraw
Different functions
Display setup of the nodes and relations
The networks nodes representing the individuals
and links representing the relations
30
Analysis Example Hot Item Analysis based on
Giftcard selling information from eBay
  • Each circle in the graph represents an active
    item in the database.
  • The label of the circle is the item title.
  • The bigger the circle and the label of circle,
    the hotter the item.
  • Items are clustered together based on the brand
    information.
  • Hot Topics during April 15 April 22, 2007
  • Hot Topics during April 22 April 29, 2007

31
Conclusion
  • In sum, NetDraw can be used for social network
    visualization.
  • There are a lot of parameters to play with in the
    tool. The results can be saved as EMF, WMF, BMP
    and JPG files.
  • NetDraw is available at
  • http//www.analytictech.com/downloadnd.htm
  • The website also provides detailed documentation.

  • If you have interest, you may also try some other
    visualization tools such as JUNG
    (http//jung.sourceforge.net/) and GraphViz
    (http//www.graphviz.org/).

32
Some Suggestions
  • Carefully prepare your data according to the
    input format required by each tool.
  • Read the documentation of each tool that you
    decide to use and understand its functionality.
    Think how it can be applied to your project.
  • Download and play with the tools. You cannot
    learn anything unless you try them by yourself!!!

33
Thanks!Good luck for your projects! ?
Write a Comment
User Comments (0)
About PowerShow.com