Title: Statistics in WR: Lecture 1
1Statistics in WR Lecture 1
- Key Themes
- Knowledge discovery in hydrology
- Introduction to probability and statistics
- Definition of random variables
- Reading Helsel and Hirsch, Chapter 1
2How is new knowledge discovered?
After completing the Handbook of Hydrology in
1993, I asked myself the question how is new
knowledge discovered in hydrology? I concluded
- By deduction from existing knowledge
- By experiment in a laboratory
- By observation of the natural environment
3Deduction Isaac Newton
- Deduction is the classical path of mathematical
physics - Given a set of axioms
- Then by a logical process
- Derive a new principle or equation
- In hydrology, the St Venant equations for open
channel flow and Richards equation for
unsaturated flow in soils were derived in this
way.
Three laws of motion and law of gravitation
(1687)
http//en.wikipedia.org/wiki/Isaac_Newton
4Experiment Louis Pasteur
- Experiment is the classical path of laboratory
science a simplified view of the natural world
is replicated under controlled conditions - In hydrology, Darcys law for flow in a porous
medium was found this way.
Pasteur showed that microorganisms cause disease
discovered vaccination Foundations of
scientific medicine
http//en.wikipedia.org/wiki/Louis_Pasteur
5Observation Charles Darwin
- Observation direct viewing and characterization
of patterns and phenomena in the natural
environment - In hydrology, Horton discovered stream scaling
laws by interpretation of stream maps
Published Nov 24, 1859 Most accessible book of
great scientific imagination ever written
6Conclusion for Hydrology
- Deduction and experiment are important, but
hydrology is primarily an observational science - discharge, water quality, groundwater,
measurement data collected to support this.
7Great Eras of Synthesis
2020
Hydrology (synthesis of water observations leads
to knowledge synthesis)
- Scientific progress occurs continuously, but
there are great eras of synthesis many
developments happening at once that fuse into
knowledge and fundamentally change the science
2000
1980
Geology (observations of seafloor magnetism lead
to plate tectonics)
1960
1940
1920
Physics (relativity, structure of the atom,
quantum mechanics)
1900
8Hydrologic Science
It is as important to represent hydrologic
environments precisely with data as it is to
represent hydrologic processes with equations
Physical laws and principles (Mass, momentum,
energy, chemistry)
Hydrologic Process Science (Equations, simulation
models, prediction)
Hydrologic conditions (Fluxes, flows,
concentrations)
Hydrologic Information Science (Observations,
data models, visualization
Hydrologic environment (Physical earth)
9A sea change in computing
Massive Data Sets Federation, Integration,
Collaboration
There will be more scientific data generated in
the next five years than in the history
of humankind
Evolution of Many-core and Multicore
Parallelism everywhere
What will you do with 100 times more computing
power?
Distributed, loosely-coupled, applications at
scale across all devices will be the norm
The power of the Client Cloud Access
Anywhere, Any Time
Slide from Jeff Dozier, UCSB
10Emergence of a fourth research paradigm
- Thousand years ago Experimental Science
- Description of natural phenomena
- Last few hundred years Theoretical Science
- Newtons Laws, Maxwells Equations
- Last few decades Computational Science
- Simulation of complex phenomena
- Today Data-Intensive Science
- Scientists overwhelmed with data sets
- from many different sources
- Data captured by instruments
- Data generated by simulations
- Data generated by sensor networks
- eScience is the set of tools and technologies
- to support data federation and collaboration
- For analysis and data mining
- For data visualization and exploration
- For scholarly communication and dissemination
- (With thanks to Jim Gray)
Slide from Jeff Dozier, UCSB
11Data Cube What, Where, When
When
A data value
Where
What
12Continuous Space-Time Data Model -- NetCDF
Time, T
Coordinate dimensions X
D
Space, L
Variable dimensions Y
Variables, V
13Discrete Space-Time Data Model
Time, TSDateTime
TSValue
Space, FeatureID
Variables, TSTypeID
14Hydrologic Statistics
Time Series Analysis
Geostatistics
Multivariate analysis
How do we understand space-time correlation
fields of many variables?
15288 USGS sites with flow and Nitrogen data
These sites are ones that were used for the
Sparrow model that continue to be operational to
2008 http//water.usgs.gov/nawqa/sparrow/
16Colorado River at Austin, Tx(08158000)
17Mean Annual Flow
18Is there a relation between flow and water
quality?
Total Nitrogen in water
19Are Annual Flows Correlated?