Exploratory Data Analysis; coined by Tukey 1977 - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Exploratory Data Analysis; coined by Tukey 1977

Description:

Exploratory Data Analysis; coined by Tukey 1977-Illuminate underlying pattern in noisy data-Predecessor to formal analysis-May lead to different analysis than ... – PowerPoint PPT presentation

Number of Views:203
Avg rating:3.0/5.0
Slides: 14
Provided by: cmay2
Category:

less

Transcript and Presenter's Notes

Title: Exploratory Data Analysis; coined by Tukey 1977


1
Exploratory Data Analysis coined by Tukey 1977
-Illuminate underlying pattern in noisy
data -Predecessor to formal analysis -May lead to
different analysis than originally planned
Data visualization (The first thing you do with
your data!!)
2
Important functions of exploratory data
visualization
  • Spot outliers
  • Discriminate clusters
  • Check distributional and other assumptions
  • Examine relationships
  • Compare mean differences
  • Observe a time-based process

http//seamonkey.ed.asu.edu/alex/teaching/WBI/EDA
.html
3
Univariate data (one variable) frequency
distributions
Distributions of height, biomass, etc. often
used to describe populations
  • How are the data distributed (including
    summary/descriptive statistics)
  • Are the data normal? (required to meet
    assumptions of many statistical techniques- more
    later)
  • If not normal, can they be transformed?

4
  • Histograms
  • Raw data hidden
  • Division to categories arbitrary
  • Excel, many programs

Identify skew, non-normality
Identify outliers
5
(No Transcript)
6
Stem-leaf plots -show original data -division to
categories arbitrary -easier to order data
first -a histogram on its side (sort of)
Stem
leaves
2
0
0
1
5
9
3
2
6
7
8
4
1
4
6
5
0
3
8
7
Box (box-whisker) plots
  • -calculate median, draw horizontal line
  • -draw a box with ends at the quartiles Q1 (25)
    and Q3 (75)
  • extend the "whiskers" to the farthest points that
    are not outliers
  • outliers are outside 3/2 times the interquartile
    range (Q3-Q1)
  • Draw a dot for every outlier

Can be done for a single distribution or
comparing several
http//mathworld.wolfram.com/Box-and-WhiskerPlot.h
tml
8
Normal probability plots will be covered later
9
  • Bivariate (2 variable) data
  • -Relationship between the 2 variables
  • Are there outliers?
  • Examined by Scatterplots

negative
none
10
Non-linear
Graphing helps you see relationships. Formal
analysis guided by a priori knowledge that one
variable causes change in the other (more later)
11
Classified Data often result from an ecological
experiment
  • - Bar chart
  • Shows means and variance
  • - shows treatment differences magnitude

15
10
5
Epilithon NPP (mg O2/m2/hr)
0
-5
high light
low light
-10
Mean ? one S.E.
12
List things that are wrong with this graph.
15
10
5
Epilithon NPP
0
-5
-10
13
Graphing Exercise
Obtain a dataset, preferably your own or a
colleagues, but can be anything Choose a
graphing style that best illustrates the
message of your data Use Excel or other
program to make a graph Print on an overhead to
show in class
Write a Comment
User Comments (0)
About PowerShow.com