Title: Chapter 5: The Normal Approximation for Data
1Chapter 5 The Normal Approximation for Data
2Chapter 5 The Normal Approximation for Data
1. The Normal Curve
Features
1. Graph is symmetric around 0 2. The total
area under the curve 100 3. The curve is
always above the horizontal axis 4. The
horizontal axis is scaled in standard units 5.
The curve is bell shaped
3Standard Scores
- Standard scores
- how far above/below the average a score is in
terms of standard deviation units - standard scores state the exact location of a
score in the distribution - most common form is the z-score
HANES data for women mean height
63.5 standard deviation 2.5
63.5
mean height of 63.5 is 0 standard deviations
above the mean thus in standard units the mean
0
4Converting to standard scores/z-scores
HANES data for women mean height 63.5
standard deviation 2.5
Simple formula standard score score minus
average standard deviation
63.5 - 63.5 0 2.5
56 - 63.5 -2 2.5
62.25 - 63.5 -.50 2.5
68.5 - 63.5 2 2.5
0
2
-.5
-2
5Converting to original scores from z-scores
Simple formula raw score (z score
times standard deviation) average
(-1 x 2.5) 63.5 61
(3 x 2.5) 63.5 71
(-2.5 x 2.5) 63.5 57.25
(.50 x 2.5) 63.5 64.75
57.25
61
71
64.75
6Finding areas under the curve
Recall from the previous chapter - we stated that
certain percentages fall between the various
standard deviations
-3 -2 -1 0 1 2 3
68
95
7Finding areas under the curve
Statisticians have created tables that provide
the exact percentages that fall within areas
under the normal curve z tables OH Table A105
(p.A105)
In terms of the standard scores the percentages
are as follows
34.13
34.13
2.14
2.14
13.59
13.59
0.14
0.14
8Exercise 1
Find the area between -1.50 and 1.50?
Table A105
86.64
9Exercise 2
Find the area between -2.30 and 2.30?
Table A105
97.86
10Exercise 3
Find the area between -1.50 and 2.30?
0 to 2.30 half of (-2.30 to 2.30)
.50(97.86) 48.93
0 to -1.50 half of (-1.50 to 1.50)
.50(86.64) 43.32
48.93 43.32 92.25
11Exercise 4
Find the area between -2 and -.50?
0 to -2 half of (-2 to 2) .50(95.45) 47.73
0 to -.50 half of (-.50 to .50) .50(38.29)
19.15
47.73 - 19.15 28.58
12Exercise 5
Find the area outside of -2.50 and 2.50?
-2.50 to 2.50 98.76
100 - 98.76 1.24
13Using the normal approximation
many histograms (types of data) follow the normal
curve, thus the average pins down the
center the standard deviation gives the spread
if histograms/data follow the normal curve, then
we can estimate the percentage of
datapoints/people that fall within a certain
interval
using the normal approximation step 1 convert
raw scores to standard scores step 2 find the
corresponding area under the normal curve using
Table A105 ALWAYS DRAW THE DIAGRAMS
14Percentiles
Average standard deviation good for normally
distributed data
not satisfactory for non-normal data/skewed data
0 100
15Percentiles
0 100
25th percentile 25 of the subjects fall below
it, 75 above
50th percentile median , 50 fall below it, 50
above
75th percentile 75 fall below it, 25 above
In skewed distributions like this, the median is
preferable to the mean as it is not influenced by
extreme data points Interquartile range 75th
percentile - 25th percentile used as a
measure of spread in skewed distributions also
not as influenced by extreme cases
16Chapter 6 Measurement Error
17Measurement Error
Repeated measurements of an object do NOT
produce the same result - the observed
differences are due to chance error example
weigh yourself on a bathroom scale questions
about chance error where do they come
from? how big are they? We can assess how
large these are by replicating the
measurement National Bureau of Standards in
Washington holds the national stds for weights
and measures (K20 (kilogram 20), NB 10 (10
grams) used to calibrate weights in the
U.S. these have been repeatedly weighed under
the same conditions (room, apparatus, people,
procedure, air pressure, temp) attempted to
control all factors that may influence the
weight NB10 weights are the same to the first 3
decimal places then they differ (Table 1, p99)
18Estimating chance error
The standard deviation (average of the
differences) estimates the likely size of the
chance error in a single measurement Note
individual measurement exact value chance
error
varies
a constant
repeated individual
measurements differ due to chance error
variability their standard deviation estimates
the chance error or variability for any single
measurement
standard deviation R.M.S. of the deviations from
the average
19Outliers
Extreme scores that are not the result of
errors Table 1 data 36, 86 and 94 very
extreme numbers that seldom occur
94 -5 z 36 3 z 86
5 z
99.87 below 00.13 at/above
4 z 99.997 below 00.003
at/above
(OH fig 2, p102) with these in mean 405
micrograms below 10 grams SD 6 micrograms 86
fall within 1 SD of average
Effect of outliers inflate the average and SD
with these out
mean 404 micrograms below 10 grams SD 4
micrograms closer to 68 within 1 SD of average
20Bias
A systematic influence in the same
direction NOT random
random/chance sometime positive, sometimes
negative systematic always in only one
direction Now individual
measurement exact value bias chance
error Without bias
the long run average of repeated measurements
variables exact value known as the
EXPECTED VALUE With bias long run
average will be off in the same direction as the
bias Note bias does not equal chance error
21Chapter 6 Concept Review
- All measurements, no matter how carefully
made, may differ - reflects chance error
- So individual measurement true value
chance error - Researchers need to estimate the likely size
of chance error before relying on a
single measurement best method is via
replication - Likely size of chance error in a single
measurement is estimated by the standard - deviation of a series of replicated measurements
- Bias/systematic error causes measurements
to be systematically too high/low - individual measurement true value bias
chance error - Even in careful measurement, we can expect a
small percentage of outliers - these can strongly influence the average and the
SD
22Chapter 7 Plotting Points and Lines
23Chapter 7 Slope
Take any 2 points (A and B) on a line
Moving from A to B
X changes in some way
Y changes in some way
B
Rise 1
A
Run 2
Ratio of rise slope run Slope the rate
at which y changes for each unit change in x
how much does y change as x changes one unit Here
the slope 1/2 0.5
24Positive, Negative, 0 Slope Values
0 change in y 0 slope
change in y slope
- change in y - slope
25Intercept
the height of the line where the line crosses the
Y axis
Intercept 0.50
Intercept 4
Intercept -3
26Plotting Lines
Plot the line passing through the point (1, 1)
with a slope of 2/3
Construction point
Run 4
Rise 2.7
The next step involves the slope
slope rise/run which here 2/3 we chose the
run 4 the rise should be positive (it is not
-2/3) Rise slope x run 2/3 times 4 2.7
27Plotting Lines
Plot the line passing through the point (1, 1)
with a slope of 2/3
Construction point
Run 4
The next step involves the slope
slope rise/run which here 2/3 we chose the
run 4 the rise should be positive (it is not
-2/3) Rise slope x run 2/3 times 4 2.7
28Algebraic Equation for a line
rule for computing the y coordinate of a point
from its x coordinate
y 0.25x 2 OR y 2 0.25x
intercept
x coordinate
General form y mx b
slope
29Algebraic equation for a line of the rule y
.25x 2
x 4 8 12 16
y .25x 2 .25(4) 2 .25(8) 2 .25(12)
2 .25(16) 2
y 3 4 5 6
Rise 2
Run8
Note slope rise/run 2/8 1/4 0.25 all
points fall on the line line the graph of the
equation
the graph of the equation y mx b is a
straight line with slope m and interce
pt b
30Algebraic equation for a line of the rule y
-.75x 4
x 0 2 4 6
y -.75x 2 -.75(0) 4 -.75(2) 4 -.75(4)
4 -.75(6) 4
y 4 2.5 1 -0.5
Run 2
Rise -1.5
Note slope rise/run -1.5/2 -3/4
-0.75 all points fall on the line
the graph of the equation y mx b is a
straight line with slope m and interce
pt b