Title: ANNOUNCEMENTS
1- ANNOUNCEMENTS
- Ecology job fair March 1st (tomorrow!)
- 1000-200, Birge Hall Atrium
- FOR TODAY
- Grab all 4 handouts in front
- Get computer, download stats examples worksheet
from website - I will show stats examples in Excel 2007, so
only use your own computer if you have this
program
2Week 6 Making use of the Badger Mill Creek Field
trip Data Analysis, Your Research Questions, and
Writing Zoo 511 Spring 2012
3Outline
- Field trip review
- Badger Mill research questions/hypotheses
- Writing a scientific paper
- Statistics and data analysis (with examples in
Excel) - Lab Enter data
4Todays goals
- Provide a basic background on how to use and
interpret common statistical tests - Prepare you to generate questions for your paper,
and to analyze data to answer these questions - Get all data entered!
5Part 1 Your QuestionsRead the handout!!!!
6Your questions should be specific and answerable
WRONG
RIGHT
Does sculpin CPUE differ among geomorphic units?
What habitat do fish prefer?
In what kind of stream are brown trout most
likely to be found?
Is brown trout density related to flow velocity?
7Example Questions
Does sculpin CPUE differ among geomorphic units?
Is brown trout density related to flow velocity?
8Other data sources
Previous years data all of the same information
was collected from the same place, around the
same time of year. Replication!
USGS http//waterdata.usgs.gov/nwis/uv?05435943
Think about these data sources as you generate
your questions.
9Two questions with a supporting paragraph for
each are due Sunday 3/4 by 500 pm via email.
Name your file Classday_Lastname_Questions.doc (e
.g., Wednesday_Latzka_Questions.doc)
10Part 2 WritingRead the handout!!!!
11Why Write?
- Gain experience articulating thoughts
- Writing is a learning experience
- It is the currency of communication (in science,
law, business, etc)
12Order of a scientific paper (see handout!)
- Title
- Abstract
- Introduction set up your study
- Methods study site, data analyses
- Results analyses, reference tables and figures
here - Discussion interpret results
- Literature Cited
- Tables and figures
- This is the order a paper is presented in it
should not be the order in which you write it
13Think before you write
- Analysis ? results figures numbers
- Search the literature ? context
14Outline
- Start with basic parts
- Add subsections
- Add topic sentences
- This will take some time, but will make your
paper much easier to write and of much higher
quality!!
15WritingStart with what you know
- Results
- Report the findings
- What did your analyses reveal?
- FIGURES SHOULD STAND ALONE!!!!
- Methods two parts
- Sampling site description and sampling
techniques relevant to your hypothesis - Statistical analysis
- Only what is relevant!
- This depends on what you put in your results!
16Note on results
- Make ecology the subject of your sentences, not
statistics. Statistics help you tell your story,
they are not your story in themselves. - WRONG Linear regression showed that there was a
significant positive relationship with a p-value
of 0.04 and an R2 of 0.81 between brown trout
abundance and flow velocity. - RIGHT Brown trout abundance increased with
increasing flow velocity (R20.81, p0.04).
17Intro and discussionWhy does it matter? What
does it mean?
- Introduction
- What is the context of the study (past research)
- Set up the experiment
- Discussion
- What do the results mean?
- Was your hypothesis correct?
- What is interesting/exciting about your findings?
- Future research directions
18WritingThe last steps
- Abstract
- for most, the hardest part of writing a
scientific paper - Short summary of the important points of the
paper - Title
- Short, sweet, descriptive
- Literature Cited
19In summary WWAD? (what would Alex do?)
- 1) Think!
- this means literature exploration relevant to
your question to get a feel for what studies have
been done - 2) Explore your data
- make lots of figures and run enough stats that
you start to get a feel for what your story is
going to be - 3) Narrow down your figures and results to those
relevant to your story - 4) Write results (referencing your figures!)
- 5) Write methods
- 6) Write discussion
- 7) Write intro
- 8) Abstract
- 9) Title!
- 10) Literature Cited
20Peer Review
- Criticism is importantconstructive criticism
is best! - Two types Internal and External. Point of
internal review is to make external review go
well - Reviews need to be taken seriously
21Part 3 Statistics
22Why use statistics?
Are there more green sunfish in pools or runs?
Run 5 4 1
Pool 2 7 3
12
10
23Why use statistics?
Are there more orange spotted sunfish in pools or
runs?
Run 5 1
Pool 2 3
5
6
- Statistics help us find patterns in the face of
variation, and draw inferences beyond our sample
sites. - Statistics help us tell our story they are not
the story in themselves!
24Important note
Data is the plural form of datum.
WRONG Data was analyzed using Microsoft Excel.
RIGHT Data were analyzed using Microsoft Excel.
When in doubt, substitute the word apples for
data, and ask if your sentence makes sense.
1 2 5
67 45 87
8 57 90
25A Few Necessary Terms
Categorical Variable Discrete groups, such as
Type of Reach (Riffle, Run, Pool) Continuous
Variable Measurements along a continuum, such as
Flow Velocity
What type of variable is Mottled Sculpin
/meter2? What type of variable is Substrate
Type?
26A Few Necessary Terms
Explanatory/Predictor Variable Independent
variable. On x-axis. The variable you use to
predict another variable. Response Variable
Dependent variable. On y-axis. The variable
that is hypothesized to depend on/be predicted by
the explanatory variable.
27A Few Necessary Terms
Mean The most likely value of a random variable
or set of observations if data are normally
distributed (the average) Variance A measure of
how far the observed values differ from the
expected variables (Standard deviation is the
square root of variance). Normal distribution a
symmetrical probability distribution described by
a mean and variance. An assumption of many
standard statistical tests.
N(µ1,s2)
N(µ2,s2)
N(µ1,s1)
28 Statistical Tests
Hypothesis Testing In statistics, we are always
testing a Null Hypothesis (Ho) against an
alternate hypothesis (Ha). p-value The
probability of observing our data or more extreme
data assuming the null hypothesis is
correct Statistical Significance We reject the
null hypothesis if the p-value is below a set
value (a), usually 0.05.
29Statistical Tests Appropriate Use
For our data, the response variable will always
be continuous. T-test A categorical explanatory
variable with only 2 options. ANOVA A
categorical explanatory variable with gt2 options.
Regression A continuous explanatory variable
30Students T-Test
Tests the statistical significance of the
difference between means from two independent
samples
Null hypothesis No difference between means.
31Compares the means of 2 samples of a categorical
variable
Mottled Sculpin/m2
Cross Plains Salmo Pond
32- Precautions and Limitations
- Meet Assumptions
- Observations from data with a normal
distribution (histogram) - Samples are independent
- Assumed equal variance (this assumption can be
relaxed) - No other sample biases
- Interpreting the p-value
33Walk through t-test
34Analysis of Variance (ANOVA)
Tests the statistical significance of the
difference between means from two or more
independent groups
Mottled Sculpin/m2
Riffle Pool Run
Null hypothesis No difference between means.
35- Precautions and Limitations
- Meet Assumptions
- Samples are independent and identically
distributed (iid). - Assumed equal variance among groups
- Residuals are normally distributed
- Groups are classified correctly
- No other sample biases
- Interpreting the p-value
36Walk through ANOVA
37- Simple Linear Regression
- Analyzes relationship between two continuous
variables predictor and response - Null hypothesis there is no relationship
(slope0)
38Least squared line (regression line ymxb)
Residuals
39Residuals Residuals are the distances from
observed points to the best-fit line Residuals
always sum to zero Regression chooses the
best-fit line to minimize the sum of
square-residuals. It is called the Least Squares
Line.
40- Precautions and Limitations
- Meet Assumptions
- Relationship is linear (not exponential,
quadratic, etc) - X is measured without error
- For any given value of X, sampled Ys are
independent - Normal distribution of residual errors
- Interpret the p-value and R-squared value.
41- P-value probability of observing your data (or
more extreme data) if no relationship existed. - Indicates the strength of the relationship,
tells you if your slope (i.e. relationship) is
non-zero (i.e. real) - R-Squared indicates how much variance in the
response variable is explained by the explanatory
variable. - If this is low, other variables likely play a
role. If this is high, it DOES NOT INDICATE A
SIGNIFICANT RELATIONSHIP!
42R-Squared and P-value
High R-Squared Low p-value (significant
relationship)
43R-Squared and P-value
Low R-Squared Low p-value (significant
relationship)
44R-Squared and P-value
High R-Squared High p-value (NO significant
relationship)
45R-Squared and P-value
Low R-Squared High p-value (No significant
relationship)
46Walk through Regression 1
47Residual vs. Fitted Value Plots
Observed Values (Points)
Model Values (Line)
48Residual Plots Can Help Test Assumptions
49Have we violated any assumptions?
50If assumptions are violated
- Try transforming data (log transformation, square
root transformation) - Most of these tests are fairly robust to
violations of - assumptions of normality and equal variance
(only - be concerned if obvious problems exist)
- Diagnostics (residual plots, histograms) should
NOT - be reported in your paper. Rather, a statement
that - diagnostic tests were performed to assure that
- assumptions of a linear regression were not
violated - is sufficient.
51Walk through regression 2, with residual plots
52Statistical significance
R20.6 p0.055 Y0.020.1X
R20.85 p0.045 Y0.020.1X
0.5 0.4 0.3 0.2 0.1 0.0
0.5 0.4 0.3 0.2 0.1 0.0
Darters/m2
Darters/m2
1
3
2
4
1
3
2
4
Flow Velocity
Flow Velocity
Take home message using a cutoff of 0.05 as a
cutoff for significance is ARBITRARY! Use your
p-values as one of multiple tools for
interpreting your results (especially because you
will likely have small sample sizes).
53Statistical vs. biological significance
R20.85 p0.045 Y0.020.1X
- For each increase in flow of 1 m/s, you would
expect an increase of 0.1 fish per m2.
0.5 0.4 0.3 0.2 0.1 0.0
Darters/m2
- If your reach contained 100 m2 of habitat, you
would expect a difference of 10 fish.
1
3
2
4
Flow Velocity
Take home message there is no magic number to
determine biological significance. YOU need to
think about what your results mean, and interpret
them in an ecological context.
54Last notes
55Enter Badger Mill Creek Data(individual ID and
diet column, only for trout)Fish
abbreviationsDouble substrateReach names w01,
w02, etc