Some statistics - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Some statistics

Description:

... follow the following rules: ... Testing a hypothesis rigorously. Some terms ... Example: Using a compass next to a large magnet we can make very precise ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 22
Provided by: RobertM7
Category:
Tags: compass | statistics | test | the

less

Transcript and Presenter's Notes

Title: Some statistics


1
Some statistics
  • use statistics as a drunkard uses a lamppost,
    for support, not for illumination

2
Why use statistics?
  • Assess significance and error (confidence)
  • what is the error of that C14 date?
  • Im holding a pair of aces how likely is it
    that someone else has a straight?
  • Prediction
  • what are the chances of an earthquake here in
    the next 20 years?

3
Prediction in geology is hard
  • Satellite orbits are example of deterministic
    variables can be calculated the sun rises
    every morning.
  • Earthquakes are pretty random in time.
  • This is characteristic of nonlinear systems such
    as long-term weather (first shown in 1960s by
    Lorentz). Even if current state is known exactly,
    the future cannot be modeled (due to chaotic
    behavior the butterfly effect).
  • Geological systems have lots of nonlinear
    effects.
  • Statistics are often the only weapon for these
    type of systems.

4
Linear and non-linear
  • However, many common geologic systems are modeled
    as linear systems.
  • Linear systems follow the following rules
  • It means that effects can be added linear
    assumption is often a poor assumption but
    nonlinear systems are hard.
  • Is f(x) ax2 linear?

5
data analysis
  • With current technology, it is usually easy to
    make lots of measurements.
  • Geophysics a 3D seismic survey might have
    1x1011 measurements.
  • But anything over a few dozen is hard to handle
    without statistics.
  • Statistics are good for
  • Getting a feel for the data
  • Checking if two datasets are related
  • Testing a hypothesis rigorously

6
Some terms
  • Random variable represents the random outcome
    of an measurement.
  • Distribution expected range of values for a
    random variable.

7
Discrete versus continuous
  • Discrete consists of sample points
  • Continuous functions dont have breaks
  • We will mostly deal with discrete functions here.
    Mostly.
  • To convert a continuous function to discrete, we
    need to sample it in some way.
  • Nyquist theorem says we have to sample at twice
    the highest frequency in the signal to retain the
    shape (sampling at less than that will lead to
    significant error)
  • Tides vary twice per day gt must sample at least
    4 times per day
  • Earthquake waves vary up to 50 times per second
    (50 Hz) so must sample at 100 times per second
    (100 Hz).
  • Also applies to spatial sampling. If a ore body
    is 100 m across, must sample every 50 m to get
    any sort of idea of shape.

8
accuracy and precision
  • Accurate measurements are close to the truth
  • Precise measurements are close to each other,
    (very little scatter) but may (or may not) be
    accurate.
  • A set of very precise measurements may include a
    bias.
  • Example Using a compass next to a large magnet
    we can make very precise measurements but the
    accuracy is terrible.

9
significant figures
A significant figure is a digit from 1 to 9 and
zero if it is not a placeholder. It shows the
precision of the measurement (but not necessarily
the accuracy). 3.14567 has 7 significant
figures 0.00320 has 3 significant figures If we
conduct calculations we must use the proper
number of significant figures use the number
with the least amount og significant
figures. For example, 1.2 1.111111111111
1.3 (we round-off the number of decimal places to
ensure that the answer has the correct
precision). It is easy to calculate too many
decimal points in Excel.
10
mean
  • Known as arithmetic mean, mean, or average.
  • We will deal largely with discrete samples.
  • In Excel use average()
  • Yields an unbiased estimate of the mean value
    (m).
  • Approaches the real expected value for large n

11
median
  • The number in the middle of a set of numbers
  • 50 of the numbers are above it 50 are below
    it.
  • Better represents the most common value.
  • Excel median()

12
Average deviation and variance
  • Measures how much a dataset varies around the
    mean ( ).
  • Average deviation (avedev)
  • Sample Variance (var)

13
Standard deviation
  • Often useful to have the deviation from the
    average described in the same units as the
    original data.
  • Excel stdev
  • This is the unbiased form (use if the mean was
    calculated from the same samples).
  • If we use n rather n-1 we get a different
    estimate (more efficient but biased estimate if
    the mean was calculated from the same data)
  • Square root of the variance
  • Both the variance and the standard deviation are
    commonly used to show the error in a measurement.

14
These are estimates
  • Need an infinite number of samples for the true
    values.
  • Different subsets of the data will yield
    different estimates of the mean.

15
Difference between mean and median
  • Assume we have three numbers
  • 1,2,10
  • The mean (or average) is 4.6667
  • The median is 3
  • Example
  • L.T. averaged 5.2 yards per carry in 2006
  • Why doesnt the coach always run the ball?
  • should easily get 10 yards in four down, on
    average

16
Propagation of errors
If we multiply by a constant number we multiply
the error by the same number. (4.00.3)(3.1456)
12.60.9 If we add two numbers add the square of
aach error and then take the square root. Dz
sqrt((Dx)2 (Dy)2) For multiplication and
division we add the ratio of the errors to the
numbers (Dx/x)2 (Dy/y)2 (Dz/z)2 (z)(sqrt(Dx/
x Dy/y )) Dz sqrt() means take the square
root
17
Example
Suppose we want to estimate the amount of oil in
a prospect we have just found. Using seismic
data, we find that the trap is 30020 m
high 100010 m long 5005 m wide It is composed
of sandstone. A core through the sandstone yields
6 measurements of the porosity 0.29, 0.19 0.25,
0.23, 0.22, and 0.26 The mean is 0.24 with a
variance of 0.0012 and standard deviation of
0.035 How much oil does it hold with correct
error estimates?
18
Volume error Volume (30020)(100010)(5005)
1.5x108 m3 Error sqrt((20/300)2 (10/1000)2
(5/500)2 )1.5x108 1.0x107 Now we want to
multiply by the porosity (0.240.12) (1.5x108
1.0x107)(0.240.04) 3.5x1073.3x106 cubic
meters of crude oil There are about 8.5 barrels
per cubic meter 2.3x108 2.1x107 barrels 230
million barrels At 100 per barrel 23.02.0
billion dollars For comparison, the last big
discovery made in the Gulf of Mexico
Thunderhorse, holds about 1 billion barrels.
The Thunderhorse platform after Hurricane Dennis
19
We can think of the histogram of pixels as a
distribution. It tells us the likelihood of
finding a specific color.
20
Some terms
  • Random variable represents the random outcome
    of an measurement.
  • Probability distribution function (PDF)
    describes how often a particular measurement
    might occur.

PDF for a normal coin
PDF for a weighted coin (always heads)
1
1
0.5
0.5
0
0
heads
tails
heads
tails
21
answer
  • L.T. averaged 5.3 yards per run in the last game
    but the median run was 3 yards.
  • A few long runs greatly increased the average.
  • So the coach is not completely crazy.
Write a Comment
User Comments (0)
About PowerShow.com