Title: maths
1 maths statistics from Probability to the
Normal Distribution Dr William
Megill 4E3.43 enswmm_at_bath.ac.uk
XX10118
2Quick Review
- Definition
- Statistics Analysis and interpretation of data
with a view toward objective evaluation of the
reliability of the conclusions based on the
data. - Why?
- Variability in the real world (Stochasticity)
- Issues with experimental data
- Issues with design for robustness
3Stats in experimental design
- Knowing some stats ahead of time is a good thing
- too often students/researchers attempt analysis
of research data only to find - they have too few data to see differences
- they have too much of the wrong data
4Types of statistics
- Descriptive stats
- Organising data to talk about them
- Inferential stats
- Inferring information about the whole (popn)
from characteristics of its parts (sample)
5Types of data
- Ratio data
- Constant size interval between adjacent units
- Need a zero (to enable multiplication)
- e.g. a 15cm bar is half the length of a 30cm one
- Interval data
- Constant size interval
- No zero (or zero is arbitrary)
- e.g. temperature
- 25-20C (77-68F) 10-5C (50-41F)
- but 40C (104F) ltgt 2 x 20C (68F)
- NB. Kelvin temperature is Ratio scale
6Types of data
- Circular data
- Variation on interval scale
- e.g. compass degrees
- Ordinal data
- Ordered, but not constant interval
- e.g. Grades, comfort
- Nominal data
- No order
- e.g. car names, blood types
7Accuracy Precision
- Accuracy
- nearness of a measurement to the actual value
- Precision
- closeness of repeated measures to each other
- Significant figures
- refer to accuracy
- use all digits in calcs, report proper sig figs
8Frequency Distributions
- Exploring data using tables graphs
You can get graphs to say anything Be wary when
reading them Be honest when drawing them
9Grouping data
- how many groups?
- rule of thumb about 10 to span range
10Frequency Polygons
- Use instead of bar graphs/histograms
- Not for ordinal or nominal data x-axis not a
ratio scale - Absolute or relative frequency
- NB do not read intermediate frequencies
11Cumulative Frequency Polygons (Ogives)
- How many were better than...
12Populations Samples
- Primary objective of statistics
- to infer characteristics of a group
- by analysing characteristics of a small
sampling of the group - Population
- entire collection of measurements about which to
draw conclusions - e.g. cars in UK, lifetime of shock absorbers,
bearing age at failure - Sample
- subset of all of the measurements in the
population - from the characteristics of sample, draw
conclusions about popn - NB One often samples a population that does not
exist - e.g. fuel additive in 40 cars, measure gas
consumption. Population is all cars which
could have had additive. - Such a population is called hypothetical or
potential
13Random Sampling
- Every member of the population has an equal and
independent chance of being selected -
- each measurement in the population has an equal
chance of being selected in the sample - the selection of any member of the sample has no
influence of the selection of any other member - Usually this is obvious. Sometimes it isnt...
14Experimental Design
- Pseudoreplication
- Imagine a study of tyre wear.
- Experimenter has access to 8 cars, with 4 tyres
each. - How many independent measurements?
Depends on the experiment either 32 or 8
15Probability
- Definitions
- Experiment
- An activity with an observable result, or set of
results - e.g. tossing a coin
- Outcome
- An observable result of an experiment
- e.g. heads
- Event
- An outcome or set of outcomes of interest
- e.g. H
- Sample Space
- Set of all possible outcomes of an experiment
- e.g. H,T
16Counting outcomes
- IF
- Sample space of one experiment has k1 elements
- Sample space of another has k2
- Then
- Combined sample space has k1 x k2
duh...
17Permutations Combinations
- Permutation
- arrangement of objects in a specific sequence
- e.g. Horse (H), Cow (C), Sheep (S)
- six different arrangements
- HCS, HSC, CHS, CSH, SHC, SCH
- note once first is placed, choices diminish for
others - Notation nPn n(n-1)(n-2)...(3)(2)(1) n!
18Permutations
- Fewer than n positions...
- Horse, Cow, Sheep, Pig
- total if 4 positions 4P4 4! 24
- but if only two positions...
- HC,HS,HP,CH,CS,CP,SH,SC,SP,PH,PC,PS (12)
-
- If some objects indistinguishable...
- HHCC,CCHH,HCHC,CHCH,HCCH,CHHC
19Combinations
- group of objects where sequence doesnt matter
-
- HC,HS,HP,CH,CS,CP,SH,SC,SP,PH,PC,PS
- but HCCH, HSSH, etc.
20Sets
- Set Collection of elements x
- Subset
- Intersection Union
- Complement
- Venn Diagram
B
A
C
21Probability of an Event
- Likelihood of an event expressed by
- relative frequency observed from large dataset
- knowledge of the system under study
22Adding Probabilities
- Mutually exclusive?
- Add em up
- P(A or B) P(A) P(B)
- Not mutually exclusive?
- P(A or B) P(A) P(B) P(A and B)
- P(A and B) P(A)P(B)
23Distributions
- Definition
- the relative numbers of times each possible
outcome will occur in a number of trials - Probability function/density
- Function describing the probability that a given
value will occur - Distribution function
- Function describing the cumulative probability
that a given value or any value smaller than it
will occur
NB PDF is derivative of DF
24- Discrete Probability Distributions
- Binomial
- Poisson
- Hypergeometric
- Continuous Probability Distributions
- Normal
25Binomial Distribution
- Discrete probability distribution of obtaining
exactly n successes out of N trials - e.g. machine known to produce, on average, 2
defective components. what is probability that 3
items are defective in the next 20 produced - Binomial coefficient number of ways of picking k
unordered outcomes from n possibilities
26(No Transcript)
27Poisson Distribution
- Used to find probability of a single event
occurring a n times in an interval of time - Different from Binomial, in that we dont know q
- e.g. death by horse kick
- Conditions
- Random events throughout an interval
- Interval can be subdivided such that
- Prob of gt1 event occurring in subinterval is zero
- Prob of 1 event occurring in subinterval is prop
to length of subinterval - Events in one subinterval independent of other
subintervals
28Poisson Distribution
- Start with binomial distribution
- express prob as function of total obs
- rewrite
- take limit as N gets big
- voila
29Poisson Distribution
Neat thing about Poisson distribution mean
(expectation) variance n
30Example
31Hypergeometric Distribution
- If M defective parts in a total population of N,
then the prob of selecting r defectives in a
sample of size n is
32Normal Distribution
- Generalisation of the binomial for continuous
variables - Remember
- Frequency distribution shows P(Xx)
- Std normal distn
- (m0, s21)
- Z statisticstandardised normal variable
33Normal Distribution
- Area under whole curve 1
- Probability of ?
- Area under proportion of curve between Z0 and
Z0.54 - Use formula or lookup table
- p(0ltZlt0.54)0.2054
- linear interpolation
- Careful to read table caption
- Often given as proportion that lies beyond Z
- Important values
- p(0ltZlt0.955) 0.33
- p(0ltZlt1.96) 0.4750
- p(0ltZlt2.575) 0.495
34Normal Distribution
- Add/subtract areas to calculate other
probabilities
35(No Transcript)
36Applications of Normal Distribution
- i.e. not just nice curves...
37Probability Intervals
- 95 of observations within 1s of m
- p(0ltZlt1.96) 0.4750
- p(-1.96ltZlt0) 0.4750
- p(-1.96ltZlt1.96)0.95
- 5 of obs outsidethis range
38Probability Limits
- Z(x-m)/s -gt X m Zs
- p(-1.96ltZlt1.96)95 -gt X ? m-1.96s,m1.96s
39One-tailed probability
40Sums Diffs Normal Vars
- See workbook
- Basic point distribution of a sum or difference
of normally distributed variables will itself
also be normally distributed
41Hypothesis testing
42Verifying claims
- Advertising line
- Luxcar, makers of the best luxury cars
- Burnol, the finest fuel you can buy
- ConstructAll, designers of beautiful buildings
- the average life of these tyres is 20,000miles
- on avg, low energy bulbs will last 8000hrs
- average bottle contents 330ml
43Hypotheses
- Null alternative hypotheses
- Null nothing interesting is happening
- Alternate something interesting is happening
- e.g.
- Explosives engineers deciding how fast to run
- one says from experience, mean burn rate
600mm/sec - other says gt600mm/sec
- Stats test
- H0 m600
- H1 mgt600
44Types of Error
- Cant be 100 sure, so possible that
- (a) correct hypothesis rejected
- (b) false hypothesis accepted
- (a) Type I error
- (b) Type II error
45Test of proportion
- Hypotheses
- H0 99 of the control modules match the spec
- H1 lt99 of the control modules match the spec
- Distribution?
- either they match or they dont -gt binomial
- calculate m and s
- Normal approximation
- because
- we can use
46Graphically...
- Assess how close 985 is to expected 990
- Convert to standard normal curve
- H1 is directional gt one tail
- since H1 pltp0, use left tail
- Define 95 confidence
- Probability is
- Discrete binomial, not continuous normal, so
were approximating
47Choosing a tail
48Test of a population mean
- e.g. to verify a manufacturers claim re boiling
point of a coolant - if we know the population variance (i.e. from
spec sheet, or large of experiments) - null hypothesis- H0 x m
- distribution- assume normal need mean stdev
- since were sampling, we need std error of mean
- assume large np gt
- calculate Z statistic
- alternate hypothesis
49Testing a manufacturers claim
50Experimental differences between paired
treatments can be tested by comparing sample of
differences to a population with mean of zero....
51Two-sample testing
- Two samples...
- do they represent a real (significant) treatment
effect, or - just two samples from the same population?
52Two-sample testing
- same as before
- H0 m1m2 H1 m1 ? m2
- Combined variance
- Z statistic
- compare to standard normal distribution
- two-tailed in this case, Z0.05(2)1.96
53Two-sample testing
- Unknown, but equal variance (usual case)
- Use pooled sample variance
- t statistic
- n1n2-2 degrees of freedom
- Unknown, unequal variance
- same statistic
- but degrees of freedom, n
54(No Transcript)
55(No Transcript)