Title: Exploratory Data Analysis; coined by Tukey 1977
1Exploratory Data Analysis coined by Tukey 1977
-Illuminate underlying pattern in noisy
data -Predecessor to formal analysis -May lead to
different analysis than originally planned
Data visualization (The first thing you do with
your data!!)
2Important functions of exploratory data
visualization
- Spot outliers
- Discriminate clusters
- Check distributional and other assumptions
- Examine relationships
- Compare mean differences
- Observe a time-based process
http//seamonkey.ed.asu.edu/alex/teaching/WBI/EDA
.html
3Univariate data (one variable) frequency
distributions
Distributions of height, biomass, etc. often
used to describe populations
- How are the data distributed (including
summary/descriptive statistics) - Are the data normal? (required to meet
assumptions of many statistical techniques- more
later) - If not normal, can they be transformed?
4- Histograms
- Raw data hidden
- Division to categories arbitrary
- Excel, many programs
Identify skew, non-normality
Identify outliers
5(No Transcript)
6Stem-leaf plots -show original data -division to
categories arbitrary -easier to order data
first -a histogram on its side (sort of)
Stem
leaves
2
0
0
1
5
9
3
2
6
7
8
4
1
4
6
5
0
3
8
7Box (box-whisker) plots
- -calculate median, draw horizontal line
- -draw a box with ends at the quartiles Q1 (25)
and Q3 (75) - extend the "whiskers" to the farthest points that
are not outliers - outliers are outside 3/2 times the interquartile
range (Q3-Q1) - Draw a dot for every outlier
Can be done for a single distribution or
comparing several
http//mathworld.wolfram.com/Box-and-WhiskerPlot.h
tml
8Normal probability plots will be covered later
9- Bivariate (2 variable) data
- -Relationship between the 2 variables
- Are there outliers?
- Examined by Scatterplots
negative
none
10Non-linear
Graphing helps you see relationships. Formal
analysis guided by a priori knowledge that one
variable causes change in the other (more later)
11Classified Data often result from an ecological
experiment
- - Bar chart
- Shows means and variance
- - shows treatment differences magnitude
15
10
5
Epilithon NPP (mg O2/m2/hr)
0
-5
high light
low light
-10
Mean ? one S.E.
12List things that are wrong with this graph.
15
10
5
Epilithon NPP
0
-5
-10
13Graphing Exercise
Obtain a dataset, preferably your own or a
colleagues, but can be anything Choose a
graphing style that best illustrates the
message of your data Use Excel or other
program to make a graph Print on an overhead to
show in class