Data Analysis - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Data Analysis

Description:

Hypothesis Testing: In statistics, we are always testing a Null Hypothesis (Ho) ... Statistical Significance: We reject the null hypothesis if the p-value is below ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 24
Provided by: elenaco
Category:

less

Transcript and Presenter's Notes

Title: Data Analysis


1
Data Analysis
2
A Few Necessary Terms
Categorical Variable Discrete groups, such as
Type of Reach (Riffle, Run, Pool) Continuous
Variable Measurements along a continuum, such as
Flow Velocity
What type of variable would Mottled Sculpin
/meter2 be? What type of variable is Substrate
Type? What type of variable is of bank that
is undercut?
3
A Few Necessary Terms
Explanatory Variable Independent variable. On
x-axis. The variable you use as a
predictor. Response Variable Dependent variable.
On y-axis. The variable that is hypothesized to
depend on/be predicted by the explanatory
variable.
4
Statistical Tests Appropriate Use
For our data, the response variable will always
be continuous. T-test A categorical explanatory
variable with 2 options. ANOVA A categorical
explanatory variable with gt2 options.
Regression A continuous explanatory variable
5
Statistical Tests
Hypothesis Testing In statistics, we are always
testing a Null Hypothesis (Ho) against an
alternate hypothesis (Ha). Test Statistic
p-value The probability of observing our data
or more extreme data assuming the null hypothesis
is correct Statistical Significance We reject
the null hypothesis if the p-value is below a set
value, usually 0.05.

6
Students T-Test
Tests the statistical significance of the
difference between means from two independent
samples
7
Compares the means of 2 samples of a categorical
variable
Mottled Sculpin/m2
Cross Plains Salmo Pond
8
  • Precautions and Limitations
  • Meet Assumptions
  • Observations from data with a normal
    distribution (histogram)
  • Samples are independent
  • Assumed equal variance (boxplot)
  • No other sample biases
  • Interpreting the p-value

9
Analysis of Variance (ANOVA)
Tests the statistical significance of the
difference between means from two or more
independent samples
Grand Mean
Mottled Sculpin/m2
Riffle Pool Run
ANOVA website
10
  • Precautions and Limitations
  • Meet Assumptions
  • Observations from data with a normal
    distribution
  • Samples are independent
  • Assumed equal variance
  • No other sample biases
  • Interpreting the p-value
  • T-tests to follow

11
  • Simple Linear Regression
  • What is it? Least squares line
  • When is it appropriate to use?
  • Assumptions?
  • What does the p-value mean? The R-value?
  • How to do it in excel

12
Simple Linear Regression
Tests the statistical significance of a
relationship between two continuous variables,
Explanatory and Response
13
  • Precautions and Limitations
  • Meet Assumptions
  • Observations from data with a normal
    distribution
  • Samples are independent
  • Assumed equal variance
  • Relationship is linear
  • No other sample biases
  • Interpret the p-value and R-squared value.

14
Residual Plots Residuals are the distances from
observed points to the best-fit line Residuals
always sum to zero Regression chooses the
best-fit line to minimize the sum of
square-residuals. It is called the Least Squares
Line.
15
Residuals
16
Residual vs. Fitted Value Plots
Observed Values (Points)
Model Values (Line)
17
Residual Plots Can Help Test Assumptions
0
Normal Scatter
Curve (linearity)
0
Fan Shape Unequal Variance
18
Have we violated any assumptions?
19
R-Squared and P-value
High R-Squared Low p-value (significant
relationship)
20
R-Squared and P-value
Low R-Squared Low p-value (significant
relationship)
21
R-Squared and P-value
High R-Squared High p-value (NO significant
relationship)
22
R-Squared and P-value
Low R-Squared High p-value (No significant
relationship)
23
P-value indicates the strength of the
relationship between the two variables You can
think of this as a measure of predictability R-Sq
uared indicates how much variance is explained by
the explanatory variable. If this is low, other
variables likely play a role. If this is high,
it DOES NOT INDICATE A SIGNIFICANT RELATIONSHIP!
Write a Comment
User Comments (0)
About PowerShow.com