Title: Data Analysis
1Data Analysis
2A Few Necessary Terms
Categorical Variable Discrete groups, such as
Type of Reach (Riffle, Run, Pool) Continuous
Variable Measurements along a continuum, such as
Flow Velocity
What type of variable would Mottled Sculpin
/meter2 be? What type of variable is Substrate
Type? What type of variable is of bank that
is undercut?
3A Few Necessary Terms
Explanatory Variable Independent variable. On
x-axis. The variable you use as a
predictor. Response Variable Dependent variable.
On y-axis. The variable that is hypothesized to
depend on/be predicted by the explanatory
variable.
4Statistical Tests Appropriate Use
For our data, the response variable will always
be continuous. T-test A categorical explanatory
variable with 2 options. ANOVA A categorical
explanatory variable with gt2 options.
Regression A continuous explanatory variable
5 Statistical Tests
Hypothesis Testing In statistics, we are always
testing a Null Hypothesis (Ho) against an
alternate hypothesis (Ha). Test Statistic
p-value The probability of observing our data
or more extreme data assuming the null hypothesis
is correct Statistical Significance We reject
the null hypothesis if the p-value is below a set
value, usually 0.05.
6Students T-Test
Tests the statistical significance of the
difference between means from two independent
samples
7Compares the means of 2 samples of a categorical
variable
Mottled Sculpin/m2
Cross Plains Salmo Pond
8- Precautions and Limitations
- Meet Assumptions
- Observations from data with a normal
distribution (histogram) - Samples are independent
- Assumed equal variance (boxplot)
- No other sample biases
- Interpreting the p-value
9Analysis of Variance (ANOVA)
Tests the statistical significance of the
difference between means from two or more
independent samples
Grand Mean
Mottled Sculpin/m2
Riffle Pool Run
ANOVA website
10- Precautions and Limitations
- Meet Assumptions
- Observations from data with a normal
distribution - Samples are independent
- Assumed equal variance
- No other sample biases
- Interpreting the p-value
- T-tests to follow
11- Simple Linear Regression
- What is it? Least squares line
- When is it appropriate to use?
- Assumptions?
- What does the p-value mean? The R-value?
- How to do it in excel
12Simple Linear Regression
Tests the statistical significance of a
relationship between two continuous variables,
Explanatory and Response
13- Precautions and Limitations
- Meet Assumptions
- Observations from data with a normal
distribution - Samples are independent
- Assumed equal variance
- Relationship is linear
- No other sample biases
- Interpret the p-value and R-squared value.
14Residual Plots Residuals are the distances from
observed points to the best-fit line Residuals
always sum to zero Regression chooses the
best-fit line to minimize the sum of
square-residuals. It is called the Least Squares
Line.
15Residuals
16Residual vs. Fitted Value Plots
Observed Values (Points)
Model Values (Line)
17Residual Plots Can Help Test Assumptions
0
Normal Scatter
Curve (linearity)
0
Fan Shape Unequal Variance
18Have we violated any assumptions?
19R-Squared and P-value
High R-Squared Low p-value (significant
relationship)
20R-Squared and P-value
Low R-Squared Low p-value (significant
relationship)
21R-Squared and P-value
High R-Squared High p-value (NO significant
relationship)
22R-Squared and P-value
Low R-Squared High p-value (No significant
relationship)
23P-value indicates the strength of the
relationship between the two variables You can
think of this as a measure of predictability R-Sq
uared indicates how much variance is explained by
the explanatory variable. If this is low, other
variables likely play a role. If this is high,
it DOES NOT INDICATE A SIGNIFICANT RELATIONSHIP!