Earthquake Prediction using Data Mining Tools - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Earthquake Prediction using Data Mining Tools

Description:

An earthquake is a sudden movement of the Earth, caused by the abrupt release of ... Available for free. Easy to use Graphical User Interface. Learning Weka ... – PowerPoint PPT presentation

Number of Views:3362
Avg rating:5.0/5.0
Slides: 26
Provided by: sukumarr
Category:

less

Transcript and Presenter's Notes

Title: Earthquake Prediction using Data Mining Tools


1
  • Earthquake Prediction using Data Mining Tools
  • Mrinalini Kabbur
  • Ritu Chinya
  • Progress Report

2
Introduction
  • An earthquake is a sudden movement of the Earth,
    caused by the abrupt release of strain that has
    accumulated over a long time.
  • Earthquakes remain to be one of the unpredictable
    natural hazards so far.
  • The goal of earthquake prediction is to give
    warning of potentially damaging earthquakes early
    enough to allow appropriate response to the
    disaster, enabling people to minimize loss of
    life and property.

3
Project Design
  • This project deals with Earthquake classification
    and prediction using Data mining tools.
  • Weka was used to develop the model
  • Naïve Bayesian was used to classify unknown class
    label.
  • Used C4.5 with 66 split to classify the data and
    10-fold cross validation to evaluate accuracy.

4
Method
  • Installalation of Weka
  • Weka is a set of software for machine learning
    and mining
  • Developed at the University of Waikato in New
    Zealand
  • Available for free
  • Easy to use Graphical User Interface

5
  • Learning Weka
  • Both of us were new to weka
  • Used tutorial by Svetlana Aksanova
  • Looked up the internet for additional information
    on Weka
  • Gathering EarthQuake Data Set
  • Consists of the Earthquakes that happened in
    the Northern California region during 2005.
  • Data gathered from United States Geological
    Survey (USGS) website.

6
  • Data preprocessing
  • Weka algorithms work on ARFF format
  • But the data was in HTML format as shown below.

7
The data was in HTML format as shown below.
8
Data Preprocessing (Contd)
  • So the data had to be transferred to an Excel
    file.
  • Tough to directly convert from HTML to Excel.
  • So the data was first saved in the word format.

9
Excel Format
10
  • Conversion from Excel to ARFF format.
  • Save the Excel file as csv.
  • Used awk commands to format the data.
  • Keyed in some missing data.

11
  • Data Cleansing
  • The earthquake data contained many parameters.
    They include
  • Date and time
  • Longitude
  • Latitude
  • Depth
  • Magnitude
  • Event ID
  • Source
  • Magt
  • Nst
  • Gap
  • Clo
  • Attributes of interest include
  • Date and Time
  • Longitude
  • Latitude
  • Depth
  • Magnitude

12
Date and time fields are not considered while
applying the classification algorithm. The filter
weka.filters.unsupervised.attribute.Remove is
applied to remove the date and time attribute.
This is shown below.
13
  • Descretize
  • Attributes contain numeric data.
  • Some Weka algorithms like ID3 require nominal
    attribute Values.
  • Convertion of numeric attributes to nominal.
  • The attributes Longitude, Latitude, Depth and
    Magnitude are all desctretized by using the
    filter weka.filters.unsupervised.attribute.Descre
    tize.

14
  • Apply Classification rules to come up with
  • Decision trees
  • Rules sets
  • Algorithms used for modelling
  • C4.5
  • Naïve Bayesian

15
C4.5
  • We have considered two cases.
  • Cross-Validation Evaluates the classifier by
    cross-validation, using the number of folds that
    are entered in the Folds text field.
  • Percentage split Evaluates the classifier on how
    well it predicts a certain percentage of the
    data, which is held out for testing. The amount
    of data held out depends on the value entered in
    the field.

16
First we will consider the classifier based on
how well it predicts 66 of the test data as
shown in the below.
17
Run Analysis
18
Run Information gives you the following
information the algorithm you used - J48 the
relation name Earthquake number of
instances in the relation 113 number of
attributes in the relation 4 and the list of
the attributes Longitude, Latitude, Depth,
Magnitude. the test mode you selected split66
Classifier model is a un-pruned decision tree in
textual form that was produced on the full
training data. As you can see, the first split
is on the Longitude attribute, at the second
level, the splits are on Latitude and
Longitude
Below the tree structure, there is a number of
leaves (which is 10), and the number of nodes in
the tree - size of the tree (which is 19). The
program gives a time it took to build the model,
which is 0.06 seconds.
In this case only 67 of 113 training instances
have been classified correctly. This indicates
that the results obtained from the training data
are not optimistic compared with what might
be obtained from the independent test set from
the same source.
19
WEKA also lets you to visualize decision tree
20
  • Accuracy Estimation
  • Ten fold Cross validation
  • Snapshot of Naïve
  • Bayesian classification
  • using Weka

21
Run Information
  • Run information
  • Scheme weka.classifiers.bayes.NaiveBayes
  • Relation Earthquake-weka.filters.unsupervised
    .attribute.Discretize-B10-M-1.0-Rlast
  • Instances 113
  • Attributes 4
  • Latitude
  • Longitude
  • Depth
  • Magnitude
  • Test mode 10-fold cross-validation
  • Classifier model (full training set)
  • Naive Bayes Classifier
  • Time taken to build model 0.06 seconds
  • Stratified cross-validation
  • Summary
  • Correctly Classified Instances 69
    61.0619
  • Incorrectly Classified Instances 44
    38.9381
  • Kappa statistic -0.0061
  • Mean absolute error 0.1187

22
Run Information (Cont)
  • Detailed Accuracy By Class
  • TP Rate FP Rate Precision Recall F-Measure
    Class
  • 0.972 0.976 0.627 0.972 0.762
    '(-inf-3.41'
  • 0 0 0 0 0
    '(3.41-3.82'
  • 0 0.019 0 0 0
    '(3.82-4.23'
  • 0 0 0 0 0
    '(4.23-4.64'
  • 0 0 0 0 0
    '(4.64-5.05'
  • 0 0 0 0 0
    '(5.05-5.46'
  • 0 0 0 0 0
    '(5.46-5.87'
  • 0 0 0 0 0
    '(5.87-6.28'
  • 0 0 0 0 0
    '(6.28-6.69'
  • 0 0.009 0 0 0
    '(6.69-inf)'
  • Confusion Matrix
  • a b c d e f g h i j lt-- classified
    as
  • 69 0 1 0 0 0 0 0 0 1 a
    '(-inf-3.41'
  • 24 0 1 0 0 0 0 0 0 0 b
    '(3.41-3.82'
  • 8 0 0 0 0 0 0 0 0 0 c
    '(3.82-4.23'
  • 6 0 0 0 0 0 0 0 0 0 d
    '(4.23-4.64'
  • 2 0 0 0 0 0 0 0 0 0 e
    '(4.64-5.05'

23
Learnings from the project
  • We both were new to Weka and learnt to use Weka
    software.
  • It was challenging to analyze large amount of
    data as compared to what we did in our home
    works.
  • We realized that data pre-processing indeed takes
    a long time.
  • We got a clear understanding of C4.5 and Naïve
    Bayesian classification algorithms.

24
Division of work
  • We worked together on all the tasks.

Conclusion
We realized that data mining tools are very
powerful and save a lot of time for classifying
huge amount data. We found that using C4.5
algorithm and 66 of data as training data gave
an accuracy of 67 whereas 10-fold
cross-validation gave an accuracy of 62 in the
case of earthquake data. The Naïve Bayesian
algorithm also correctly classified 61 of the
test data. So, the results were pretty close. All
in all, the project was very interesting and
challenging and we enjoyed working on it.
25
Reference
  • http//www.studentprogress.com/appln/colleges/cogr
    ec/Papers/D_05.pdf
  • www.meteoquake.org/our.html
  • http//www.cs.waikato.ac.nz/ml/weka/index.html
  • http//gaia.ecs.csus.edu/mei/215/tutorial.html
  • http//www.ngdc.noaa.gov/seg/hazard/sig_srch_idb.s
    html
  • Weka Explorer tutorial by Svetlana Aksanova
Write a Comment
User Comments (0)
About PowerShow.com