Title: ESM 206A Data Analysis for Environmental Science
1ESM 206AData Analysis for Environmental Science
Management
- Winter 2009
- Hunter Lenihan and Nick Parker
2Course Objectives
- Learn how to use quantitative data analysis to
- Make decisions regarding compliance with
environmental standards or performance of new
management strategies - Assess the impact of past management actions or
development projects - Make predictions about the likely outcome of
proposed policy or management actions
- In ESM 206A, learn to use the following tools
- hypothesis formulation and testing,
- t-tests, Analysis of Variance (ANOVA)
- Ordinary Least Squares regression
- Single variable
- Multiple variable
- Regression with discrete dependent variables
(time permitting) - Probit and Logit
3Instructors
To make an appointment, find an open time on my
Corporate Time schedule, add a meeting, and send
me an email so Im aware of it. Note that if you
schedule something for the immediate future, I
may not find out about it in time.
4Class format
- Lectures meet twice per week
- Winter quarter
- T-TR 200-315
- Feb. 10,12,17,19, 24, 26
- Mar. 3, 6, 10, 12
- Labs meet once per week, in the GIS lab.
- There are 3 sections in Winter quarter
- - W 200-250, W 430-520, TR 830-920
- - Weeks 7-10 (No lab the first week!)
5Class format
- Labs
- These provide you the opportunity to learn the
nuts and bolts of running the analyses - Will work on problem sets in lab
- The lab sections are usually quite full if you
need to switch sections on a continuing basis,
please find someone to swap with.
6Micro-exam
- A relatively short assignment that you will turn
in for a grade. - Typically one extended problem that involves both
conceptual and technical aspects - Treat as a take-home exam once you open it, no
help from peers or instructors (but can use
notes, books, online resources) - One micro-exam this quarter due 4 PM, Friday,
12 March
7Text
- Statistical Methods in Water Resources by Helsel
and Hirsch - PDF available on course webpage
- Individual chapters (smaller files) at
http//pubs.usgs.gov/twri/twri4a3/html/pdf_new.htm
l - Other readings/links will be posted as appropriate
8Computing
- Excel can be used for simple analysis, using the
Analysis ToolPak - We will mostly use JMP
- JMP is comprehensive, reliable, but expensive
(but UCSB pays!) - JMP provides a somewhat user-friendly interface
to it - Based on SAS, worlds most more powerful stat
program/system - The course webpage is at http//www.bren.ucsb.edu/
academics/course.asp?number206A - We will use this page for the whole year
9Definition of statistics
- Mathematical science pertaining to the
collection, analysis, interpretation or
explanation, and presentation of data - Provides tools for prediction and forecasting
based on data - Applicable to all fields of science
10Definition of statistics
- Descriptive statistics methods used to
summarize or describe a collection of data - Inferential statistics patterns in data are
modeled in a way that accounts for randomness and
uncertainty in the observations. Then used to
draw inferences about the process or population
being studied.
11Definition of statistics
- Descriptive and inferential statistics applied
statistics - Three basic types
- Monte Carlo analysis minimal assumptions about
data Uses randomizations of observed data as
basis for inference - Parametric analysis Assumes data were sampled
from an underlying distribution of known form
(normal) Estimates the parameters of the
distribution from the data Estimates
probabilities from observed frequencies of events
and uses probablities as a basis for inference
(frequentist inference) - Bayesian analysis Also assumes the data were
sampled from an underlying distribution of known
form. Estimates parameters not only from data but
also from prior knowledge, and assigns
probablities to these parameters
12Definition of statistics
- Mathematical statistics concerned with the
theoretical basis of the subject
13Applied statistics
- Common goal investigate causality
- Draw conclusions on the effect of changes in the
values of predictors (independent variables) on
dependent variables (response) - Y a ßX
- There are two major types of investigations
(studies) experimental and observational - Difference between the two types how the study
is conducted. Each can be very effective.
14Tools to learn
Experimental studies
Observational studies
- Hypothesis formulation
- and testing
- t-tests
- Analysis of Variance (ANOVA)
- Ordinary Least Squares regression
- Probit and Logit regression
15Some notation and formulas
- For a random variable called x, the sample
statistics are - Mean
- Variance
- Standard deviation
- The population statistics are called
- Mean
- Variance
- Standard deviation
16Data for examples in lectures
17Data from traps used by lobster fishery
18www.calobtser.org
Matt Kay Bren PhD student (Kay_at_lifesci.ucsb.edu)
19Fishery
20(No Transcript)
21Hypothesis formulation and testing
22Karl Raimund Popper (1902-1994)
- Most influential philosophers of science of the
20th century
- Professor at the London School of Economics
- Repudiated classical observationalist /
inductivist - approach to science as the only method
- Advanced empirical falsification
- Vigorous defense of liberal democracy
principles of social - criticism
23The Scientific Method - from a Popperian
perspective
- 1. Conception - Inductive reasoning
- a. Observations
- b. Theory
- c. Problem
- d. Belief
- 2. Leads to Insight and a General
- Hypothesis
- 3. Assessment is done by
- a. Formulating Specific hypotheses
- b. Comparison with new
observations - 4. Which leads to
- a. Falsification - and rejection of
insight, and specific and general
hypotheses, or -
- b. Confirmation - and retesting of
alternative hypotheses
Conception
- Inductive reasoning
Perceived Problem
Previous Observations
Belief
INSIGHT
Existing Theory
General hypothesis
Confirmation
Falsification
Assessment
- Deductive reasoning
24Absolute vs. measured differences
Example - Specific hypothesis number of Oak
seedlings is higher in areas outside oil
polluted (impacted) sites than inside polluted
sites
What counts as a difference? Are these different?
25Statistical analysis - cause, probability, and
effect
II) Statistical Methodology General A)
Null hypotheses Ho 1) In most
sciences, we are faced with a) NOT whether
something is true or false (Popperian
decision) b) BUT rather the degree to which an
effect exists (if at all) - a statistical
decision. B) Therefore 2 competing
statistical hypotheses are posed a) HA there
is a difference in effect between (usually
posed as ) b) HO there is no difference
in effect between
26Statistical analysis - cause, probability, and
effect
27Statistical analysis - cause, probability, and
effect
The logic of statistical tests how they are
performed
- Assume the Null hypothesis (Ho) is true (e.g., no
difference in number of - oak seedlings in impact and non-impact
sites).
2) Compare measurements - generally this means
comparing two sample distributions
(determined from the experiment or survey)
- Comparison of distribution generally by
comparing means and the - estimate of error associated with the
sampling of the means. Simplest - case is the Standard Error of the Mean (SE
or SEM) Sx
STDEV (sx) / n.5 standard deviation / square
root of level of replication
3) Determine the probability that distributions
are similar/different
4) Compare with a critical p-value to assign
significance
What is a distribution (around a mean)?
28Calculation of statistical distribution
Distribution of Oak Seedlings - pre-impact or
non-polluted site
Sites 100
Mean per site 25
Total seedlings 2500
29Evaluate effect of sample size on calculation of
and confidence in Mean
Compare for sample size's of 5,10, 20, 50, 99
cells
Iterate 50 times
Example sample 10 sites to
determine mean
x fifty
iterations
30Means 21.5 22.3 23.0 23.9 24.9 25.1 25.8 26.5 27.8
29.9 etc
True Mean 25
31Effect of number of observations on estimate of
Mean
32Statistical comparison of two sample distributions
Ho
X
X
1
2
HA
X
?
X
1
2
X
X
2
1
33How to estimate optimal sample size
1) Do a preliminary study of variables that will
be evaluated in project 2) Plot the mean and
some estimate of variability of data as a
function of sample size 3) Look for sweet spot
where estimates of mean and variance (or standard
deviation) converge on a stable value 4)
Calculate a bang for buck relationship to
determine if a robust design (sufficient
sampling) can be paid for
34Trade off between accuracy and cost
Cost
Accuracy
Sample Size e.g. number of replicate sites
35(No Transcript)