Title: Chapt 2. Variation
1(No Transcript)
2(No Transcript)
3Chapt 2. Variation How to summarize/display
random data appreciate variation
due to randomness Data summaries. single
observation y (number, curve, image,...)
sample y1 ..., yn statistic s(y1 ..., yn)
4Features location scale
(spread) Sample moments (y1 ...
yn)/n average s2 S (y - )2
/(n-1) sample variance Order statistics
y(1) ? y(2)? ...? y(n) minimum, maximum,
median, range quartiles, quantiles p 100
trimmed average IQR, MAD medianyi -
median(yi)
5Bad data Outlier - observation unusual compared
to the others Resistance Trimmed
average Example (Midwife birth data). Hours in
labor by day n 95 7.57 hr s2
12.97 hr2 min, med, max 1.5, 7.5, 19 hr
quartiles 4.95, 9.75 hr
6(No Transcript)
7Graphs. Indispensable in data analysis Histogram
disjoint bins L(k-1)?,Lk?) Plot
count, nk , or proportion nk /n EDF yj ?
y/n Estimates CDF, ProbY ? y Scatter plot
(uj , vj ) Parallel boxplots - location,
scale, shape, outliers, comparative median,
quartiles, 1.5 IQR
8(No Transcript)
9Random sample Y1,...,Yn independent CDF F Mean
E(Y) ? y dF(y) ( ?y?f(y)dy if ?
density f) p quantile yp F-1 (p) Laplace
(continuous) f(y) exp-y-?/?/2? ,
-?ltylt? Poisson (discrete) Prob(Yy) f(y)
?yexp- ?/y! , y0,1,2, ... Count of daily
arrivals poisson Hours of labor gamma
10(No Transcript)
11Gamma f(y) Will be providing many examples
of useful distributions in these beginning
chapters Some discrete, some continuous
12SF Chron 01/26/09
13Sampling variation. "the data y1 ,..., yn will
be regarded as the observed values of random
variables" - probabilities defined "ask how
we would expect s(y1,...,yn) to behave on
average, ..., understand the properties of S
S(Y1 ,...,Yn )" Y1,...,Yn sample from
distribution mean ?, variance ?2 Sample moment
E( ) nE(Yj )/n ?, unbiased E(X
Y) E(X) E(Y)
14var( ) ?2/n var(XY) Var(X) var(Y), if
uncorrelated var(aX) a2 var(X) ? (Yj -
??)2 ? (Yj - - ?)2
? (Yj - )2
?( - ?)2 n?2 E( ? (Yj - )2 )
?2 E(S2) ?2, unbiased Birth data. n 95,
7.57 hr, s/?n 0.137 hr
15Probability plot. Checking probability model
plot y(j) versus F-1(j/(n1)) For normal take
F ? ? from table or statistical package
Normal prob plot "works" if ?, ? unknown For
N(?, ?2 ), E(Y(j)) ? ?E(Z(j) )
16(No Transcript)
17Tools for approximation Weak law of large
numbers. ? ? in probability as n
? ? is a consistent estimate of
? Definition. Sn? S in probability if for any
? gt 0 Pr(Sn - S gt ?) ? 0 as n ? ? If S
s0, constant and h(s) continuous at s0 then
h(Sn)? h(s0) in probability
18(No Transcript)
19Central limit theorem. ?n( - ?)/? ? Z
N(0,1) in distribution as n ? ? Definition. Zn
converges in distribution to Z if Pr(Zn
? z) ? Pr(Z ? z) as n ? ? at every z for
which Pr(Z ? z) is continuous The CLT provides
an approximation for "large" n
20(No Transcript)
21Average as an estimate of ?. If X is N(?
,?2) then (X - ?)/? is N(0,1) Writing Zn
?n( - ?)/? ? n-1/2 ?Zn Indicates
how efficiency of depends on n and ?
22Covariance and correlation. cov(X,Y) ?xy
EX-E(X)Y-E(Y) sample covariance Cxy
?nj1 (Xj - )(Yj - )/(n-1) Cxy ? ?xy in
probability correlation ? cov(X,Y)/?var(X)var(
Y) -1 ? ? ? 1 R Cxy/?Cxx Cyy R ?
? in probability
23R -.340
24Some more distributions. Cauchy f(y) 1/?1
(y - ?)2 -? lt y lt??
distribution of same as that of Y1 no
moments, long tails Uniform F(u) 0 u ?
0 u 0ltu?1 1 1 lt
u E(U) 1/2, center of gravity
25Exponential f(y) 0 y lt 0
exp-y y ? 0 Pareto F(y) 0 y
lt a 1 - (y/a)-? y ? a a,
? gt 0 Poisson process Times of events y(1),
y(2), y(3), ... y(1), y(3)-y(2), y(4)-y(3),...
i.i.d. exponential
26Chi-squared distribution Z1 , Z2 ,..., Z?
IN(0,1) W ??j1 Z2j E(W) ?
var(W) 2? Multinomial page 47 p
classes with probs ?1 ,..., ?p adding to 1
27Linear combination L a ? bj Yj E(L)
a ? bj ?j If independent var(L) ?
bj2 ?j2 If Yj are IN(?j,?j2), then L is
N(a ? bj ?j, ? bj2 ?j2 )
28Moment-generating function MY(t) E(exptY), t
real X, Y independent MXY (t)
MX(t)MY(t) For N(?,?2) M(t) expt ?
t2 ?2/2) The normal is determined by its moments